Register | Log in


Subscribe Now>>
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Fresh Perspectives
Download PDF|Send to Colleague

Think again

Do your data warehousing efforts help or hinder creativity?

by Rob Armstrong, director of data warehouse support at Teradata

During a recent panel discussion among data warehouse practitioners, we contemplated the pros and cons of a centralized versus distributed architecture. An argument that caught my attention was whether centralization spurs or spurns innovation.

My contention for centralization was that the data warehouse increases a company's ability to be innovative. Another panelist, arguing that the data warehouse inhibited creativity and exploration, vied for a distributed architecture. We agreed in the end that both viewpoints were correct.

In data warehousing, the time lag from when an idea is originated to when it is proved or disproved through analytics to when it is turned into an action item determines whether the innovation is a success. If gathering data is cumbersome, innovation is stagnated. If combining new data in new ways is easy, innovation is supported.

Centralized argument
The main role of the enterprise data warehouse (EDW) is to link critical data from multiple subject areas to provide a consistent and cohesive view of the business. To preserve the data relationships, the data is extracted from source systems and transformed into a normalized model. As more subject areas are added to the data warehouse, more relationships can be explored.

Innovation progresses when the data is integrated across the enterprise or subject areas. Then, as new ideas emerge, users do not need to reconcile data inconsistencies during queries and analyses. Rather, the data is available in a neutral data model and can be accessed directly. This eliminates the need to index, denormalize, build cubes and conduct other time-consuming tasks when new sets of analytics are run.

In this sense, the data warehouse accelerates "enterprise innovation." This is evident in cross-functional opportunities when data integration from multiple subject areas is critical to getting consistent and meaningful answers.

Departmental argument
By using a data mart, the business community can leverage data without worrying about its quality or prioritization since the data will be loaded into a separate database more quickly and easily than into the EDW where integration is required. Because the data can be loaded more quickly and departmental users can customize the data model to their own needs, analytics can run without any delays from enterprise oversight and without contention of other queries and workloads. From this perspective, the data marts clearly enhance "functional innovation."

Freedom with structure
This quandary of needing departmentalized data sets but still needing enterprise consistency and integration of that data led Teradata to introduce the Teradata Data Warehouse Appliance 2550 and improve priority scheduling with Teradata Active System Management.

These solutions give the departmental users two options:
Leverage the smaller platform as an adjunct of the EDW to maintain enterprise consistency and governance but at a lower cost and with less rigor.
Use Teradata Active System Management to create logical sandboxes within the EDW, thereby preserving both departmental and functional innovation. In this logical data model, sandboxes are dedicated data areas that fall outside the IT rigor but are implemented into the total EDW architecture.

Both solutions leverage the same SQL, load tools, connectivity and database management. These consistencies minimize IT's efforts to fully integrate the new data or applications into the EDW when it makes sense. Hence, time is saved and all types of innovation are supported.

Play in the sand
To be innovative, companies must be agile at the departmental level while providing integration at the enterprise level. Sandboxes can help. Because they can be housed either on the EDW or in other platforms that are tightly linked to the EDW, sandboxes enable individual companies to quickly add new data elements and exploratory data sets to their total data warehouse environment without creating a series of unrelated data fiefdoms. Let's explore the two housing options:

Building sandboxes within the EDW
An option is to build a sandbox inside an existing EDW since much of the data necessary for any new analysis is most likely already in the warehouse. Giving the users a portion of disk space allows them to load new data, create extracted exploratory data sets or add data of personal interest to increase their ability to segment, correlate or perform new analytics. This sandbox data is not of production quality and is administered by the users.

This option employs the parallelism of the production database. Data movement is minimized, and the same tools, utilities, modeling and SQL are available that are in use at the production level. This commonality will ease the eventual migration of the data from the sandbox to the production environment. It will also help IT understand the complexity and integration issues that must be addressed when the data is moved into the EDW so they can perform a more accurate cost-benefit analysis when deciding on upcoming projects.

Besides implementing sandboxes physically as indicated above, another benefit to this approach is that it can be done virtually using views, or by using a combination of the two procedures. Any data already on the system can be accessed and manipulated via views to further cut down the amount of data duplication and movement.

This centralized-system approach, however, places the new sandbox in competition with the production workloads, even with Teradata Active System Management. In the end, with the data and corresponding applications headed toward integration from the sandbox into the production system, IT can better understand the overall performance, impact and concurrency levels that must be addressed in the workload management profiles.

Building sandboxes outside the EDW
A common method to building sandboxes is to use a secondary platform. A platform outside of the data warehouse eases the burden on the IT community while responding to the immediate challenges faced by the users. It also allows the users to take charge of their own environment. But a secondary platform must be undertaken carefully lest anarchy takes hold and the overall data warehouse effort suffers. Diligence must be taken that the exploratory sandbox environment does not become a "shadow production system" and usurp the role of the data warehouse.

Accomplishing this balance between user freedom and enterprise consistency requires governance and cooperation among the user, IT and the executive communities. The business community must agree to reuse data from the production system whenever possible. IT must agree to assist business in ensuring that data models, load processes and tool sets are aligned with corporate strategy and the centralized data warehouse. And the executive community must ensure that the sandbox is funded, that its use is contained to exploration and innovation and, finally, that the resulting data and corresponding applications will later be integrated into the EDW.

By taking this secondary-platform approach, users can quickly and easily add specialized data to current data sets, create analytical data sets from production pulls and play "what if" games. They can also combine various data sets to test new hypotheses without having to compete with prioritized work as they would with a platform-built sandbox.

A secondary-platform sandbox can provide users with insight and spur innovation, and also help them calculate business benefits. The users can try new ideas and gauge their effectiveness before lobbying the steering committee to include the data and applications in the EDW. This allows them to be innovative by quickly testing new ideas and incorporating only the best ones into the enterprise processes.

A sandbox is not concrete
It is important that neither sandbox approach become a replacement for a company's enterprise goals. The sandbox is a good playground for users, but that is how it should remain. Data in the sandbox should be managed by the users, not by IT. The sandbox is not covered by disaster recovery strategies, no archives are taken, and no long-term storage is created.

One common situation to watch for is that the users like the sandbox results so much, they demand that the analytics and reporting that were developed as a test immediately become part of production and managed by IT. This is where the EDW vision starts to fall apart.

Certainly, the users can have their sandbox, but governance requires that any application or analytics derived from the sandbox go through a proper cut over methodology so that any "production applications" be run against the main EDW.

Limiting the sandbox to a short-term environment encourages users to effectively leverage it for individual innovation. Data can and should be loaded and played with, but once proven valuable, the data must be prioritized for inclusion in the EDW so as to enable further enterprise innovation. T

Teradata Magazine-December 2008

Related Link

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.