ROLAP and HOLAP solutions using SAS and Teradata.
by Michelle Wilkie and Arlene Zaima
One of the best-kept secrets in SAS®'s business intelligence (BI) offering is its relational implementation of online analytical processing
(OLAP). SAS is well known for its powerful analytics that typically requires data to be housed in a separate SAS data set or, in the case of
OLAP, inside a cube. SAS also provides the option to implement a relational OLAP (ROLAP), where the data stays in the data warehouse. The
benefits can be viewed by examining the three most common OLAP techniques: multi-dimensional OLAP (MOLAP), hybrid OLAP (HOLAP) and ROLAP.
(See figure 1.)
In the first method, MOLAP, data is extracted from the data warehouse and aggregated into a data structure, commonly referred to as a cube,
for analysis. Since the data is pre-aggregated, the response is quickly returned to the end users. The cost of this technique relates to the
overhead of the tasks the BI administrators must perform. First, the data resides in both the warehouse and the cube. This means that the data
must be updated and maintained in two locations. Second, since the data must go through an aggregation process when a MOLAP cube is built or
updated, additional overhead is incurred. As businesses expand their analysis to include more dimensions or deeper levels of analysis, the
cost and overhead of moving and replicating data into an external cube becomes a challenge for the BI administrator and IT.
The next technique, HOLAP, addresses some of the challenges of the MOLAP implementation. HOLAP is a hybrid approach in which higher-level
aggregations that are commonly accessed are stored on a server and the more granular information is stored in the data warehouse. This
technique was developed to enable larger definitions of cubes without affecting the cube build time. A cube designer can add details or
dimensionality into the cube without increasing the overhead cost of the MOLAP cube. This provides the BI administrator the flexibility to
establish the location of the multi-dimensional data depending on access frequency, administration and processing overheads.
In the third OLAP option, ROLAP, the data stays in the data warehouse and only the metadata is stored outside the database. Each request is
converted to SQL and sent to the data warehouse, where the results are retrieved and returned to the analyst. The ROLAP solution resolves the
overhead problem of maintaining data in multiple locations, as well as the additional processing involved in building or updating the cube;
however, if the ROLAP is not designed and implemented properly, the response time may be slow.
A SAS ROLAP and HOLAP implementation can be optimally built with a Teradata Database to maximize performance. OLAP query processes are
optimized through aggregate join indexes (AJIs), a Teradata Database feature. An AJI is a join index that specifies SUM or COUNTS aggregate
operations across one or more tables. AJIs require no user or BI administrator maintenance and are used automatically by the Teradata
Optimizer to improve ROLAP requests.
MOLAP, ROLAP and HOLAP cubes are all supported in SAS 9.1.3. A cube designer can define a cube using SAS OLAP Cube Studio (an easy-to-use user
interface) or using code (PROC OLAP) that is then built by the SAS Workspace Server.
A SAS cube comprises three parts: metadata, navigation files and the physical data or aggregation tables. (See figure 2.) The first two
components do not differ with the different OLAP techniques.
The metadata for a cube created in the SAS Enterprise Intelligence Platform defines information such as location of data, cube structure,
cube-based security permissions and calculated measure definitions. The navigation files are used to help understand how information of the
input data translates to the structure of the cube; for example, how members relate to each other and formats, member properties and captions
for each member.
The physical data is dependent on which OLAP technique the cube designer specifies when building the cube structure:
MOLAP. Relevant SAS proprietary highly indexed aggregation tables are created and stored within the physical cube.
ROLAP. All data resides in the relational database management system (RDBMS) where relational tables are optimized for
low-level dimensional requests, and aggregate indexes are created for higher-level OLAP requests. The Teradata system will
determine the optimal database structure to use.
HOLAP. A mix of the SAS proprietary aggregation table and relational tables will be used. This is typically dependent on
the granularity and cardinalities that are present within the cubes.
SAS OLAP Server
The SAS OLAP Server has a dual role:
- Authentication of the user against the SAS Metadata Server
- Authorization and validation of what the user is allowed to see
- Handles the multi-dimensional expressions (MDXs) passed from SAS BI clients
- Retrieves the relevant data that answers the MDX query
- Sends that data back to the clients
Which OLAP technique the cube is based on will determine how the MDX query is handled and translated into the appropriate query that will be
passed either to an underlying database or internally. In the MOLAP-based cube, the SAS OLAP Server spawns multiple threads internally to
retrieve the queries from the relevant cube aggregation tables. For a ROLAP-based cube, MDX is translated into SQL queries, which are passed
down to the RDBMS to handle and optimize.
For the most optimized performance at query time, a SAS cube requires aggregation tables that best meet the query result set. SAS provides application response
measurement (ARM) logs that help cube designers or administrators tune a cube based on end-user interactions or queries that have been submitted against that
Teradata Database considerations
SAS OLAP cubes support three types of input data: star schema, detail data or summarized tables. Star schema input data sources will typically
give SAS OLAP the best build performance; however, the physical database design of any data warehouse should reflect the customer's business,
independent of any tool or application requirements.
Teradata recommends an application-agnostic data model such as third normal form, adhering to the best practices and methodology that provide
an enterprise view of the business. To implement SAS ROLAP on a normalized data model or snowflake schema, a semantic layer must be built on
top of the table to represent a star.
The optimal solution is building aggregates on top of a normalized model or snowflake schema and using view semantic layer to represent a star.
It is crucial that the normalized model or snowflake schema be "cleansed," meaning there are no NULLs, data transformations are complete and
data is ready for reporting. If not, then it may be necessary to build a physical semantic layer, as the Teradata aggregate approach described
above will not work on "uncleansed" normalized data.
If a physical semantic layer is required, it is recommended to implement a snowflake schema that is populated by INSERT/SELECTs from the
normalized model in the Teradata Database. The INSERT/SELECTs would be defined so that they perform the data-cleansing tasks to result in a
snowflake schema that is ready for reporting. Views would then be created to present the star schema for SAS. Teradata AJIs can then be built
on top of the snowflake schema to increase OLAP performance.
A ROLAP solution
The advantages of a ROLAP solution include:
Only metadata and navigation files are created, resulting in fast build times.
Data management remains within the RDBMS, not within the cube.
A ROLAP-based cube lets the RDBMS handle the SQL and optimization, which is dependent on implementing an AJI feature. This is the preferred
When defining the SAS cube structure, the cube designer needs to be aware of the second-to-last window in the SAS Cube Designer Wizard where
the box Do not create an NWAY must be checked. This fully summarized table, composed of all crossing of the levels defined in the cube,
is equivalent to the PROC OLAP option NO_NWAY.
Better, faster analysis
SAS and Teradata naturally complement each other with a powerful and flexible solution for BI administrators. When organizations consider an
OLAP technique, the choices are extended with an accelerated SAS ROLAP solution with the Teradata Database.
This combined solution enables businesses to analyze data at the "speed of thought" with the breadth and depth of analysis only provided by an
integrated solution. Now you can make the ideal choice to best meet your expanding business needs. T
Michelle Wilkie, a product manager with SAS Institute Inc., supports research and development teams developing OLAP products.
Arlene Zaima, a strategic intelligence program manager at Teradata, has more than 10 years of experience in advanced analytics.
Teradata Magazine-September 2008