Next release provides technology for an Active Enterprise Intelligence solution.
by Todd Walter
Making significant changes transparent to users is the challenge successfully tackled by Teradata 13.0. This
latest release virtualized the storage subsystem architecture, redesigned the management functions and introduced
automation of the dual-active ecosystem—all while preserving the best of the Teradata system. Users will also benefit
from new functionality and key query performance optimizations that are compatible with the existing and familiar
applications and processes.
The following questions and answers describe the modifications and enhancement
made to the system in Teradata 13.0:
Q What does it mean to virtualize
A The Teradata Database's architecture has always had a strong linkage of one unit of parallelism,
or AMP, to slices of disk space on physical spindles. Teradata Virtual Storage changes that model completely.
The AMP's link to the physical disks is eliminated, and all disks connected to a group of nodes, or clique, are
treated as one storage pool. Storage is handed out as needed in small allocation units to AMPs and can be returned to the
pool if required. All of the unique properties of the Teradata shared-nothing logical architecture are preserved with an
assignment of each allocation unit to a single AMP, despite sharing the physical disks. Teradata Virtual Storage is implemented
transparently at the storage level, so no changes are required to applications using the Teradata Database. (See figure.)
Q Why does virtualizing the storage help?
A Teradata Virtual Storage makes the storage configuration much more
flexible. Adding disk space to existing
configurations as needs change is easy,
plus the disk space additions no longer need to evenly match new spindles to AMPs. Existing disk storage can be evacuated
for service or replacement.
Q What does Teradata Virtual Storage offer the data warehouse user?
A Multi-temperature data warehousing is optimized at both the physical and logical levels. This
gives users direct availability to all of the data, both "hot" (more frequently accessed) and "cool" (less accessed) data.
Separate specialized storage and access methods are also eliminated since users no longer need complex processes to access
historical or archival data.
Multiple dimensions of physical disks can also be configured in the same Teradata clique and system using Teradata Virtual
Storage. The storage size and performance can be matched to the data volume and usage to make the configuration price-appropriate.
For instance, smaller, faster disks can be included for data that is more frequently used, while larger disks can be included in the
same space pool for large, less frequently used data. To learn more about Teradata's multi-temperature features, read "Waste not, want not" in
Teradata Magazine, Vol. 7, No. 2.
Q Is a big management effort required to place the data on the right storage?
A No extra administrative effort is required. A key capability of Teradata Virtual Storage is to fully automate data placement
on the physical storage. First, each allocation unit is monitored for its access frequency. Then, the access pattern is used to place
the allocation unit on the part of the storage pool that corresponds with the performance for data with that pattern. Data placement
will be optimized within zones on a single spindle and among multiple spindles that have different performance characteristics. Because
Teradata Virtual Storage operates at the allocation unit level, parts of a single table can be optimized differently: The hot partitions
will be placed on faster storage, and the cool ones will be placed on slower parts of the disk pool.
Q What is new for dual-active implementations?
A Teradata 13.0 brings new functionality to Teradata Replication Services and introduces Teradata Multi-System Manager.
Q What is new in Teradata Replication Services?
A Dynamic object replication makes the service easier to manage and maintain. When new objects are created within the scope of
a replication group, the new object definition is automatically replicated, followed by automatic replication of the data. Additionally,
triggers and identity columns are directly and automatically supported. Newer data types, like large data objects and user-defined types,
are also supported.
Q What is Teradata Multi-System Manager?
A This event-driven tool automatically manages the data warehouse ecosystem and is especially useful in monitoring the Teradata
Database instances in a dual- or multi-active configuration. It monitors and allows control of the software and surrounding components
that make up the data warehouse implementation, such as business intelligence (BI); application; extract, transform and load (ETL);
replication; and query director servers. Working in concert with current Teradata monitoring and control mechanisms, Teradata Multi-System
Manager reports on the status of systems, data and applications, and it sends alerts if operational thresholds are reached. When a failure
occurs, the solution manages the operation of the ecosystem through failure, while the system is being serviced and when the monitored
components are put back into action.
Q What if our organization has only one system?
A The fault isolation program in Teradata 13.0 continues to increase the availability of a single system. The lock manager will
isolate faults to only the transaction that performs runaway lock allocation. The file system will identify faulty data structures
and restrict access to only the affected objects rather than requiring offline service from Teradata Customer Support. When an extraordinary
event requires a restart, restart times have been reduced to shorten how long a system is offline.
Q What is new for managing the Teradata Database?
A A number of database management features make specific tasks easier and quicker for DBAs and system administrators. Teradata
Active System Management introduces a number of new classification rules for workloads; new rules and categorization options are available
for DBQL; Transfer Statistics allows the DBA to copy statistics between tables rather than recollecting them; and backup and restore
performance and usability are improved.
Teradata Viewpoint, the Web-based management interface from Teradata, delivers new portlets and functionality with Teradata 13.0. It
provides an entirely new way for all types of users to get the continuous monitoring information they require to manage the system and
Q Are there any security updates?
A LDAP (lightweight directory access protocol) and single sign-on implementations are made easier and are more integrated with
the rest of the ecosystem. Certification is included for several specific directories, including ADAM and eDirectory. Teradata
Trusted Sessions enables database security for end users of applications employing pooled sessions. Several internal hardening projects
make the Teradata system generally more secure from damage by hackers and trusted users.
Q What new functionality does Teradata 13.0 provide?
A Extensibility is a continued focus in Teradata 13.0, including Java user-defined functions (UDFs), answer sets for Java
Stored Procedures, UDFs in recursive queries, global memory available to UDFs and ordered input to table functions. Collectively,
these and previous extensibility features provide key functionality for optimizing the operation of SAS and other applications by
moving analytics close to the large data sets in the data warehouse.
Q Is there new SQL functionality?
A Before Teradata 13.0, a business wanting to know how many months in a row a balance increased or decreased would have
difficulty solving that with SQL. Teradata invented the RESET WHEN addition to the ordered analytic operations. This allows the
user to manage a counter based on a value comparison of the rows that are before and/or after the current row in an ordered set. With
RESET WHEN, the business problem becomes a straightforward SQL statement rather than a complex process.
Single Value Subqueries better enables a single value to be computed from the data warehouse and included in a query result.
GPS, RFID and other technologies enable organizations to acquire a location as a key attribute. The setting can be static, as in a
store or a customer's home, or dynamic, such as where a car currently is on the road. Teradata 13.0 adds "Spatial" as a built-in data
type so location and movement are part of the analytic queries served by the data warehouse.
Q What new performance can we expect from Teradata 13.0?
A Performance is enhanced in several areas. A number of automatic query optimizations improve performance of specific classes
of queries. Statistics collection performance is improved, and sample statistics are more consistent. Larger memory on 64-bit nodes
will improve algorithms and relieve limits. New functionality will help extract, load and transform (ELT) processes and BI tool performance.
Q What is new in query optimization?
A Join indexes (JIs) have become a key technology for optimizing BI and OLAP access to the data warehouse. A number of query
optimizations were added in Teradata 13.0 to ensure that the JIs are used wherever possible to deliver maximum performance value, and
improvements were made in the performance of processes that update the JI. Dynamic Partition Elimination was also enhanced to get the
most out of partitioned and multi-level partitioned tables.
DISTINCT and GROUP BY have long been considered areas where queries needed to be tuned. Now the Optimizer will automatically choose
the best algorithm for the best query performance. COUNT(*) will scan the file system structures rather than the table, resulting
in a fast count of total records in a table.
The limit of 64 tables per query was increased to 256 tables to enable wide enterprise analytics on an enterprise data model. The
Satisfiability and Transitive Closure algorithms were enhanced to optimize application and tool-written SQL that is less tightly
coded than handwritten SQL.
Q How is statistics collection improved?
A Collecting statistics on a fairly unique column in a large table is a costly process, as it affects system resources—primarily CPU
and I/O—and takes extra time. A new algorithm was introduced to optimize collection on unique primary index (UPI), non-unique primary
index (NUPI) and other fairly unique columns.
Sample statistics are efficient to collect but do not result in good statistical information if the columns are very non-unique. The
sampling and extrapolation algorithms were enhanced to make sample statistics much more generally applicable.
Q How does larger memory help performance?
A The disk cache algorithms were enhanced to take full advantage of the new memory sizes available on 64-bit Teradata platforms.
Algorithms, such as aggregation and cache-based operations, will improve performance by taking advantage of more memory as well. By
keeping the needed data in memory, I/O to disk can be eliminated and overall performance improved.
The number of AMP worker tasks (AWTs) was also increased to take advantage of a 64-bit system. The additional AWTs enable more work
to execute concurrently. For example, more AWTs can be made available for time-sensitive tactical work or for continuous updates. Of
course, this makes workload management policies and configuration that much more important.
Q What is new for ELT and BI tool performance?
A Teradata 13.0 introduces a new type of table for temporary data capture and use. This table breaks the usual Teradata Database
rules in that it is neither hash distributed nor hash ordered. It is a local table—the data stays on the AMP on which it was received or
created—and it is an append-only table with no ordering. This layout is more efficient for acquiring the data into work tables for ELT
processing and for the temporary tables that are generated by SAS, BI tools and applications. It eliminates skew that may otherwise
occur if a poor choice of primary index (PI) is made or if no good choice is available. If no PI is specified, this table form will be the
default. Teradata Parallel Transporter, Teradata FastLoad, Teradata TPump, Array Insert and INSERT SELECT put data into these tables. The
Optimizer automatically recognizes these tables when they are referenced and appropriately redistributes the data for the required
Enhancing the active ecosystem
Virtualization, management, ultra-high availability, functionality and performance are all offered through Teradata 13.0. Together,
the changes provide the technology to meet the service levels required for Active Enterprise Intelligence implementations while maintaining
the high performance that is valued in the Teradata system. With the installation of Teradata 13.0, organizations can further optimize
their data warehouse investment. T
Todd Walter, CTO of Research and Development, has been with Teradata since 1987.
Teradata Magazine-December 2008