Register | Log in


Subscribe Now>>
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Ask The Experts
Download PDF|Send to Colleague

Teradata 12.0: Active performance

New release adds more value to your enterprise data warehouse.

by Todd Walter with contributions from Alan Greenspan

Active data warehousing is the process of mixing strategic and operational intelligence on the same platform by utilizing up-to-date enterprise data. With active data warehousing, very high requirements are placed on the underlying technology.

For one, the active data warehouse must perform considerably well on all parts of the workload. Not only must it manage and track a widely mixed workload under all conditions, but it must also be capable of managing the flow of changing data. Second, it must be possible to implement new applications quickly and easily.

Teradata 12.0 is the next major step toward making it easy for anyone to build a high-performance active data warehouse.

Benefits of multi-level partitioned primary index
> Increases performance of multi-dimensional queries where the set of constrained dimensions changes from query to query
> Enhances performance of queries with constraints (including range constraints) on multiple columns without the overhead of secondary indexes
> Decreases the amount of I/O needed to execute a query
> Provides an access method that allows multiple ways to access the data efficiently
> Allows DBAs more partitioning expressions flexibility when there are multiple choices; the DBA can define multiple partitioning expressions and obtain the benefits from each of them and in combination
—A.G.

Q What changes deliver new levels of performance?

A Partitioned primary index (PPI) is extended to multiple levels, statistics are enhanced to deliver better estimates, many query estimating and planning improvements have been added, and the explain plans are extended with information that previously had to be assumed.

Q How does multi-level PPI (ML-PPI) work?

A The current PPI capability allows for one function to determine the horizontal partitioning of the table. With ML-PPI, those partitioning capabilities are extended to multiple levels. Separate functions within ML-PPI define the partitioning for each level, which can then be utilized together or independently for much more granular access to the desired data in the table.

For example, an insurance company may partition first by state, then month. Questions may be asked by state, month or both, allowing the quantity of data accessed to better match the requested analysis. Reducing the data accessed reduces the I/O load on the system and improves overall system throughput.

Q How is statistics-based estimation improved?

A More and varied users can submit a greater assortment of queries, making accurate estimation for query planning extremely important. Statistics are crucial for guiding the cost-based optimization of queries.

The number of intervals in the statistics structure has been doubled, significantly increasing the detail, accuracy and understanding of skew. Demographics of multiple column statistics are improved with a better picture of how nulls appear in the data.

Q What happens if my statistics are not current?

A Stale statistics can lead to invalid estimations that, in turn, can negatively affect query planning. Balancing the overhead of re-collecting statistics against the potential of inaccurate statistics often demands trade-offs. For example, with certain columns (especially dates), it has been necessary to collect statistics frequently to get reasonable plans for queries at or near the end of the data range.

Teradata 12.0 significantly reduces this overhead by extrapolating appropriate values when the collected statistics are not current, thus avoiding daily collection and reducing overall system utilization. To continue with the above scenario, when statistics in a date column have not been recently re-collected, the optimizer will determine how much additional data has been added to the table and will extend the statistics to account for the new date ranges. Query planning will improve even when the statistics are not collected every day. (See figure, below.)

Q What is new in query estimation and planning?

A Query complexity continues to rise with many nested views, subqueries, derived tables and complex joins. The Teradata optimizer has been enhanced to capture additional information about each nested level in the query and then to carry that information throughout the planning process. Improvements have been made to the estimation of the cost and results of complex joins. New query rewrite rules have been added to address many customer-provided optimization opportunities, and the rewrite engine can now execute in multiple passes, giving further opportunities to improve the plan. More accurate and consistent query plans will be produced as a result.

Figure: Extrapolated statistics reduce collection requirements
enlarge
Teradata 12.0 can extrapolate appropriate statistics values when collected statistics are not current. As this example shows, when a query references a date beyond the range with collected statistics, the database will automatically extrapolate the collected statistics to create an accurate plan. This process allows the reduction of frequent statistics collection while getting accurate cost-based optimizations.

Q What changes have been made in the explain plans?

A Size of spools, join and group key columns, and qualified referenced object names in the query, along with additional information, are included in the explain plan output to provide more information to the users reading the explain.

Q How does Teradata improve the ability to manage the ever-increasing active workload?

A All of the aforementioned query estimation accuracy improvements pull double-duty by helping the accuracy of the workload management rules. More information can be provided by the user or application about each query. Workload exceptions have been extended and a traffic cop has been added to automate changes to workload rules and settings based on user-defined or system events. An entirely new way of accessing information is being developed to provide additional management information to more data warehouse users.

Q What is a traffic cop?

A New workload management profiles can be implemented when system conditions change or when a user-defined event such as the end of the load window is reached. For example, if a system component has failed or the system is exceptionally busy, the traffic cop can switch rule sets to appropriately adjust the workflow based on the current system conditions.

Q How do users and applications provide more information about the work on the system?

A Like banding a bird to track its flight path, a query band can now be provided with each query. The band can contain any number of attributes and values that provide detailed information about the query and may be referenced by workload management rules during the query classification phase. The information is then captured in the query log, where it can be used for future analysis of the work flowing through the system.

Query banding is especially valuable for applications that send work through pooled sessions, such as traditional session pool applications, analytic applications using pooled sessions, business intelligence (BI) tools and new Web service applications. Information useful for workload management, or for tracking the use of the data warehouse (such as the application, work unit, requesting user, etc.,) can be acquired through the query banding. After the application is adjusted to provide the information, all of the tracking and linking to workload management is handled automatically.

Q What new workload exceptions are available?

A System-level workload exceptions have been added in Teradata 12.0. Now a single rule can be used to raise exceptions, regardless of the query's workload group. Multiple rules can be assigned to a single workload group to allow multiple levels of control for requests that are not behaving as expected.

Q How will systems management information be provided in the future?

A Teradata 12.0 includes a new application programming interface (API) to make the system management information available via standard SQL interfaces. With this capability, data can be retrieved and control settings altered. This makes the information available through the standard ODBC and JDBC interfaces.

On top of those APIs, a new system management user interface has been implemented. Using Web services, portal delivered displays and a fully extensible architecture, a fundamentally new form of management interface is provided to administrative and end users.

Q How will the increasing flow of change to the data warehouse be handled?

A The continuous flow of data needs to get into the database efficiently and then must be backed up without affecting the flow. If availability and disaster recovery requirements lead to a dual system implementation, the data must be synchronized with a system in another data center.

Q What is new for the data acquisition and integration process?

A Many implementations use extract, load and transform (ELT) processes, which perform the transformation and apply steps within the database, taking full advantage of the parallel data engine to do the work. Teradata 12.0 adds the ability to perform bulk merge (upsert), working from a table of change data and doing all the work within the database.

Insert and merge operations have been enhanced with an option to log errors to an error table rather than initiating an abort. This means that less effort needs to be spent on getting the data perfect before applying it. Together, these two functions significantly improve the ELT capabilities for the warehouse.

ELT using bulk SQL operations avoids constraints of the load utilities, allowing greater use of physical tuning techniques for active queries (unique secondary and join indexes) and allowing more use of active features (triggers, referential integrity).

Q How do we perform backup without affecting the continuous loading processes?

A Online Archive now allows backups to be performed without stopping the load processes. It will automatically initiate a checkpoint, save a log of changes and back up the log as part of the backup. On a restore, the log will automatically be restored and rolled back to the checkpoint ensuring that a consistent restore has been done.

Q How will the increasing volume of data changes be synchronized?

A Teradata Replication Services has been upgraded to significantly increase the bandwidth per replication group. Greater load volumes and more tables can be handled by a single replication group.

Q What is new for building applications?

A Teradata's extensibility strategy takes several steps forward in Teradata 12.0, allowing for more applications to be implemented closer to the data in the data warehouse. The extensibility functions will be used to deliver Teradata functionality as well; XML support and spatial data support will be the first examples.

Q Are we able to write our applications in Java?

A Stored procedures (SPs) can now be implemented in Java. An implementer can choose the ANSI Stored Procedure Language, C/C++ or Java as the implementation language for procedures to run in the database. Application developers who wish to implement their entire application in Java can write the database portion in Java as well.

Q When can we return result sets from Teradata stored procedures?

A Teradata 12.0 includes the ability to return sets of result rows from a SQL stored procedure. This will allow SPs to perform a wider range of application functions.

Q How can we write C/C++ procedures that easily utilize data from the warehouse?

A With a newly defined interface, it is easier to submit SQL to Teradata from within a C/C++ stored procedure. This makes it simpler to utilize data from the warehouse while also accessing external interfaces or libraries such as connecting to the message bus or linking to another database engine.

Q What if we want to return more general result sets from a table function?

A This release includes a new method for defining the result row from a table function that allows the result parameters and data types at the invocation time of the function.

A general-purpose function can now be written to operate upon a more general source of data and return data in the form desired by each request.

Activate your enterprise
Teradata has made major continuous progress in delivering the supporting technology for building an active data warehouse. Each step makes it easier and more automatic to deliver the endemic access to integrated data that every enterprise is demanding. Teradata 12.0 is the next major step in that evolution, making major progress in query planning, statistics collection, performance, workload management, system operations and extensibility.

Installing and taking advantage of this technology will position your Teradata platform for the easiest possible active data warehouse implementation or as an extension of your current implementation to provide more users the data they need to deliver the best possible value to your organization every day. T

Teradata 12.0: The 12th major database release

The next release of the core Teradata product set has a new name in addition to a broad collection of exciting and innovative capabilities: Teradata 12.0. Now all of the software products at the core of the data warehouse infrastructure will have a consistent, unified numbering sequence. Beginning with this release, the set of products that are designed, developed and tested together will have a single release number.

Teradata selected the number 12 to start the unified numbering sequence because it is the 12th major database release in Teradata's history. This current release was preceded by five major database releases of Version 1 ("TOS"-based, Teradata Operating System), and six major database releases of Version 2 ("Open" OS-based, UNIX, Linux and Windows).

Effective with the new release, the numbering system will be:
> Teradata 12.0
> Teradata Database 12.0
> Teradata Tools and Utilities (TTU) 12.0

Teradata Database 12.0 and Teradata Tools and Utilities 12.0 are included in the Teradata 12.0 release. The individual client Tools and Utilities products will also adjust to the 12.0 numbering sequence.

The term "Teradata Warehouse," initially developed for use within the company, was adopted externally to package releases and indicate system-level certification of the various software products with different release numbers. This certification will continue as an important step in the development cycle, but the unified numbering scheme will replace the need for a separate warehouse package designation.

Minor releases will continue to be numbered in the current method. For example, the next major release will be Teradata 13.0 and the minor release following it will be Teradata 13.1.

—A.G.

Todd Walter, CTO, Teradata Development Division, oversees R&D efforts for Teradata Database software and systems. He is also responsible for the future vision and development of the active data warehouse.

Alan Greenspan is the product marketing manager for Teradata Database and load utilities. He has been with Teradata for 15 years.

Teradata Magazine-September 2007

Related Link

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.