Teradata expands market reach with a new data warehouse appliance.
by Chris Twogood and Ed White
Data warehousing has gone through a structural and functional evolution over the last few decades. The Teradata DBC 1012 was awarded product of the
year in 1986 by Fortune magazine. It was a reliable, simple, cost-effective solution for basic decision support and reporting.
As the years passed, data warehousing became more sophisticated and focused on real-time, pervasive business intelligence (BI). To facilitate this evolution,
Teradata developed the active data warehouse to handle continuous data loads as well as complex decision support queries and event processing while injecting
intelligence into thousands of operational applications—all from one integrated system. Tremendous business value and competitive advantage are associated with
this type of data warehousing.
Some companies are just getting started with data warehousing and still only need the basic data warehouse without a lot of extra functionality. The Teradata Data
Warehouse Appliance 2550 has been introduced as an integrated solution for organizations that are just starting with BI and need traditional decision support and
reporting tools.
Designed with the latest in technology and scalability, the Teradata 2550 can also be used by companies that are experienced in BI and want to get more out of
their enterprise data warehouse (EDW). For instance, the platform can be used as an analytical sandbox to develop and test applications before integrating them
into the EDW. It can also be implemented as a data mart outside the EDW in unique situations, such as when an organization has a department with short-term data
needs that, because of compliance or geographic requirements, fall outside the scope of IT's capabilities.
Simple, powerful, cost-effective
The Teradata 2550 is built on a shared-nothing massively parallel processing (MPP) architecture with 144 AMPs to 144 disks in the single cabinet. With an
AMP-to-disk ratio of 1:1, full-system utilization can be achieved with very few tasks. This capability to fully utilize resources with only minimal threads of
concurrency means AMPs get proportional performance resources, file scans are faster and decision support performance is increased. This allows the system
parallelism to generate enough workload to get all disk drives busy with one or a few active queries. (See figure, below.)
This implementation technology is architecturally optimized for decision support workloads that typically have fewer concurrent users who run longer, complex
queries. It does not, however, support Teradata Active System Management since the Teradata 2550 is not targeted to the active mixed workload environment.
The MPP architecture enables parallel queries across the CPU, memory and storage for optimized performance, and automatically manages data placement across disks.
Parallelism enables table scan operations on all database disks at raw disk transfer rates and drives key performance for fast file scans and decision support
workloads. Because the Teradata 2550 uses a unique disk subsystem, the SQL query Optimizer also uses a different set of cost coefficients to evaluate and compare
query plans. As a result, when processing a SQL request the Optimizer will select a query plan different from one that would be used for the same SQL request on
an active data warehouse from Teradata. Consequently, the amount of time database administrators (DBAs) spend tuning the system is reduced.
Other benefits include:
|
Best-in-class workload management features. Users can be assigned to new groups, and the pre-configured workload settings can be set to rush,
high, medium and low. If the system is not fully utilized, the CPU is automatically available to lower-priority groups, regardless of the base
setting. The system also supports threshold exception actions, which enable automatic management of long-running queries.
|
|
Scalable up to 140TB. A total of 12.6TB of user data (with 30% compression) can be contained in a single cabinet, and the system can grow to
11 expansion cabinets with up to 140TB of user data. Expansion cabinets are identical to the system cabinets, with the exception of the BYNET ethernet
switches.
|
|
Enterprise-class components. The system is composed of proven Intel nodes with redundant power supplies, UPS power, redundant system
management, redundant BYNET with fault tolerant, heartbeat monitoring and load balancing.
|
|
System availability. Power failure protection is ensured with dual AC inputs. RAID-1 guards against disk failure, and cliques protect against
node failure.
|
Teradata Database 12.0 compatibility
Simple and easy to use, the Teradata 2550 is delivered ready to run and can be live in a few hours. The cost-effective, fully bundled solution is pre-installed
in a single power-efficient cabinet and can be connected to other applications and systems.
The platform has integrated Intel servers, enterprise-class storage and an open SUSE Linux 64-bit operating system. Optimized for traditional decision support
analysis, the platform includes system management and data load tools such as the Teradata Utility Pack, Teradata Manager and Teradata Parallel Transporter Load
and Export Operators.
Teradata Database 12.0 is the foundation for the entire Teradata platform family. (See bottom bar: "An extended family," below.) While the database configurations
have been tuned to optimize for decision support workload, the core underlying functions remain the same. This allows companies with expanding workloads to easily
migrate to more powerful systems without costly changes to load programs, data models or underlying structures.
To optimize the Teradata 2550 for traditional decision support and analytical workloads, Teradata Database 12.0 comes preset with:
|
Cache Threshold modified to 100% to optimize throughput for workloads and environments with partitioned primary index (PPI) tables, fewer
secondary indexes and low to moderate concurrency. Because spools produced by the Teradata 2550 are expected to be larger than those produced by an
active data warehouse, a fully cached threshold keeps more generated spool space in memory and reduces I/O.
|
|
File Segment Cache that has a default set to 80% and contains the most recently used database segments. When the system reads a database block,
it checks the cache first. If the block is cached, the system avoids the overhead of rereading the block from disk.
|
|
Cylinder Read to enable high-performance disk transfers that are efficient for table scans and decision support workloads. The smaller the
actual block size of the table, the greater the I/O benefit. In the Teradata 2550 the default number of cylinder slots per AMP is set to eight.
|
|
Perm Database Size of 254 sectors for times when Cylinder Read does not apply. In full table scan scenarios when increased throughput capacity of
disk drives is desired, large data blocks are beneficial.
|
|
ReadAhead feature that is available when sequential file access workloads are running. This function provides improved read performance and
faster access times by enabling the data to be pre-read, based on header lookups. With this option turned on, when the file system tries to read a
data block from the file, it issues a read-ahead I/O and the next data block is brought into memory.
|
|
Pre-Fetch feature to improve read performance and query response times by pre-reading the data in cache. When the Teradata Database accesses
the cylinder index, the disk controller firmware uses pre-fetching to move the entire cylinder to the controller cache. Load balancing, another enhancement
made to the disk controller firmware, determines the availability of the drives and directs I/O requests to the less-busy drive in the RAID-1 pair. This
results in higher throughput for sequential scan-based workloads.
|
Basic needs
Different organizations have different BI needs. Some companies are just starting out with a data warehouse and require only the basic system and tools. Other
companies want a second, smaller data warehouse to complement their existing EDW environment. The Teradata 2550 has the capability for both purposes.
Businesses today that want to span their analytical reach throughout the enterprise have more data warehousing choices than available decades ago. With the
Teradata 2550, companies can be assured of a system that implements modern-day technology but with the simplicity reminiscent of the first data warehouse
environment of years past. T
| An extended family |
|
Teradata's powerful platform family addresses customers' business and technical needs. These platforms are powered by the Teradata
Database 12.0 engine and can accommodate data of all levels of complexity and size, from departmental to active data warehousing.
—C.T. and E.W.
|
|
Chris Twogood is the product marketing manager for the Teradata platforms.
Ed White, director of Teradata Product Marketing, manages the global marketing of Teradata platforms, products and programs.
Teradata Magazine-September 2008
|