Register | Log in

Subscribe Now>>
Home Tech2Tech Features Viewpoints Facts & Fun
Applied Solutions
Download PDF|Send to Colleague

Cubes by design

ROLAP and HOLAP solutions using SAS and Teradata.

by Michelle Wilkie and Arlene Zaima

One of the best-kept secrets in SAS®'s business intelligence (BI) offering is its relational implementation of online analytical processing (OLAP). SAS is well known for its powerful analytics that typically requires data to be housed in a separate SAS data set or, in the case of OLAP, inside a cube. SAS also provides the option to implement a relational OLAP (ROLAP), where the data stays in the data warehouse. The benefits can be viewed by examining the three most common OLAP techniques: multi-dimensional OLAP (MOLAP), hybrid OLAP (HOLAP) and ROLAP. (See figure 1.)

In the first method, MOLAP, data is extracted from the data warehouse and aggregated into a data structure, commonly referred to as a cube, for analysis. Since the data is pre-aggregated, the response is quickly returned to the end users. The cost of this technique relates to the overhead of the tasks the BI administrators must perform. First, the data resides in both the warehouse and the cube. This means that the data must be updated and maintained in two locations. Second, since the data must go through an aggregation process when a MOLAP cube is built or updated, additional overhead is incurred. As businesses expand their analysis to include more dimensions or deeper levels of analysis, the cost and overhead of moving and replicating data into an external cube becomes a challenge for the BI administrator and IT.

The next technique, HOLAP, addresses some of the challenges of the MOLAP implementation. HOLAP is a hybrid approach in which higher-level aggregations that are commonly accessed are stored on a server and the more granular information is stored in the data warehouse. This technique was developed to enable larger definitions of cubes without affecting the cube build time. A cube designer can add details or dimensionality into the cube without increasing the overhead cost of the MOLAP cube. This provides the BI administrator the flexibility to establish the location of the multi-dimensional data depending on access frequency, administration and processing overheads.

In the third OLAP option, ROLAP, the data stays in the data warehouse and only the metadata is stored outside the database. Each request is converted to SQL and sent to the data warehouse, where the results are retrieved and returned to the analyst. The ROLAP solution resolves the overhead problem of maintaining data in multiple locations, as well as the additional processing involved in building or updating the cube; however, if the ROLAP is not designed and implemented properly, the response time may be slow.

A SAS ROLAP and HOLAP implementation can be optimally built with a Teradata Database to maximize performance. OLAP query processes are optimized through aggregate join indexes (AJIs), a Teradata Database feature. An AJI is a join index that specifies SUM or COUNTS aggregate operations across one or more tables. AJIs require no user or BI administrator maintenance and are used automatically by the Teradata Optimizer to improve ROLAP requests.

SAS cubes
MOLAP, ROLAP and HOLAP cubes are all supported in SAS 9.1.3. A cube designer can define a cube using SAS OLAP Cube Studio (an easy-to-use user interface) or using code (PROC OLAP) that is then built by the SAS Workspace Server.

A SAS cube comprises three parts: metadata, navigation files and the physical data or aggregation tables. (See figure 2.) The first two components do not differ with the different OLAP techniques.

The metadata for a cube created in the SAS Enterprise Intelligence Platform defines information such as location of data, cube structure, cube-based security permissions and calculated measure definitions. The navigation files are used to help understand how information of the input data translates to the structure of the cube; for example, how members relate to each other and formats, member properties and captions for each member.

The physical data is dependent on which OLAP technique the cube designer specifies when building the cube structure:
MOLAP. Relevant SAS proprietary highly indexed aggregation tables are created and stored within the physical cube.
ROLAP. All data resides in the relational database management system (RDBMS) where relational tables are optimized for low-level dimensional requests, and aggregate indexes are created for higher-level OLAP requests. The Teradata system will determine the optimal database structure to use.
HOLAP. A mix of the SAS proprietary aggregation table and relational tables will be used. This is typically dependent on the granularity and cardinalities that are present within the cubes.

The SAS OLAP Server has a dual role:
Security validation
  • Authentication of the user against the SAS Metadata Server
  • Authorization and validation of what the user is allowed to see
Query engine
  • Handles the multi-dimensional expressions (MDXs) passed from SAS BI clients
  • Retrieves the relevant data that answers the MDX query
  • Sends that data back to the clients

Which OLAP technique the cube is based on will determine how the MDX query is handled and translated into the appropriate query that will be passed either to an underlying database or internally. In the MOLAP-based cube, the SAS OLAP Server spawns multiple threads internally to retrieve the queries from the relevant cube aggregation tables. For a ROLAP-based cube, MDX is translated into SQL queries, which are passed down to the RDBMS to handle and optimize.

For the most optimized performance at query time, a SAS cube requires aggregation tables that best meet the query result set. SAS provides application response measurement (ARM) logs that help cube designers or administrators tune a cube based on end-user interactions or queries that have been submitted against that cube.

Teradata Database considerations
SAS OLAP cubes support three types of input data: star schema, detail data or summarized tables. Star schema input data sources will typically give SAS OLAP the best build performance; however, the physical database design of any data warehouse should reflect the customer's business, independent of any tool or application requirements.

Teradata recommends an application-agnostic data model such as third normal form, adhering to the best practices and methodology that provide an enterprise view of the business. To implement SAS ROLAP on a normalized data model or snowflake schema, a semantic layer must be built on top of the table to represent a star.

The optimal solution is building aggregates on top of a normalized model or snowflake schema and using view semantic layer to represent a star. It is crucial that the normalized model or snowflake schema be "cleansed," meaning there are no NULLs, data transformations are complete and data is ready for reporting. If not, then it may be necessary to build a physical semantic layer, as the Teradata aggregate approach described above will not work on "uncleansed" normalized data.

If a physical semantic layer is required, it is recommended to implement a snowflake schema that is populated by INSERT/SELECTs from the normalized model in the Teradata Database. The INSERT/SELECTs would be defined so that they perform the data-cleansing tasks to result in a snowflake schema that is ready for reporting. Views would then be created to present the star schema for SAS. Teradata AJIs can then be built on top of the snowflake schema to increase OLAP performance.

A ROLAP solution
The advantages of a ROLAP solution include:
Only metadata and navigation files are created, resulting in fast build times.
Data management remains within the RDBMS, not within the cube.

A ROLAP-based cube lets the RDBMS handle the SQL and optimization, which is dependent on implementing an AJI feature. This is the preferred method.

When defining the SAS cube structure, the cube designer needs to be aware of the second-to-last window in the SAS Cube Designer Wizard where the box Do not create an NWAY must be checked. This fully summarized table, composed of all crossing of the levels defined in the cube, is equivalent to the PROC OLAP option NO_NWAY.

Better, faster analysis
SAS and Teradata naturally complement each other with a powerful and flexible solution for BI administrators. When organizations consider an OLAP technique, the choices are extended with an accelerated SAS ROLAP solution with the Teradata Database.

This combined solution enables businesses to analyze data at the "speed of thought" with the breadth and depth of analysis only provided by an integrated solution. Now you can make the ideal choice to best meet your expanding business needs. T

Michelle Wilkie, a product manager with SAS Institute Inc., supports research and development teams developing OLAP products.

Arlene Zaima, a strategic intelligence program manager at Teradata, has more than 10 years of experience in advanced analytics.

Teradata Magazine-September 2008

More Applied Solutions

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:

Protegrity | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.