Register | Log in


Subscribe Now>>
Home Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Why Teradata
Download PDF|Send to Colleague

Deciphering the data mart mystery

A flexible decision-support architecture provides insight.

by Debbie Smith

Data is the lifeblood of an enterprise. Investigating and analyzing data provides a corporation with insight on where it has been and how the business got to where it is, while studying data trends provides an understanding of where the organization is going and what might be needed for the next step or to change direction.

Deciphering the data mart mystery

Because an organization's information is locked in its data, users are passionate about its availability and accessibility, and they may go to creative lengths to get what they need through data marts, operational data stores (ODSs) or sandboxes. All may have a role to play, but these analytic methods have the potential to become time-consuming data management challenges if not kept under control. IT must stay ahead of the users by providing a methodology and infrastructure that will ensure the data is acceptable, accessible, flexible and timely.

Decision-support architecture
The best approach to establishing a reliable decision-support infrastructure is through an enterprise data warehouse (EDW). This may not always be feasible if, for instance, the data has not yet been integrated into the EDW. Creating a data mart, ODS or sandbox should be based on business reasons, such as:
Internal proof of concept (POC) for potential applications
Testing and exploring new business ideas in a sandbox environment
Storing data of minimal value that does not merit its integration into the data warehouse
Compliance regulations
Geographical needs
Temporary placeholder for data pending an application implementation
Specialized short-term data requests, such as an analytical point application
Data issues, data profiling investigations

The figure shown illustrates a best practices architecture for a high-value decisioning environment. As a centralized repository, an EDW provides business users direct access to the integrated data they need, when they need it.

In situations where a data mart might be necessary—when testing a new application, for instance, or validating that data gained through a merger is compatible with the current environment—a data mart can be implemented virtually, inside the EDW with combinations of logical views and physical structures for specific use, performance or workload requirements. Users will have access to the organization's data without physically duplicating it into a separate data mart (or platform).

When business needs are not fulfilled with a virtual data mart, one of two types of external data marts might be needed. The first and most favored choice is a dependent data mart. Since this is sourced from the EDW, data issues are minimized. The second type, the independent data mart, is typically only used when needed data is not in the EDW.

Issues with going outside the EDW
Data marts are seldom the answer for long-term success. Like ODSs and sandboxes, data marts that are external to an EDW can encounter challenges when capturing and storing data.

The spiral factor
Consider this scenario from an IT perspective: Business users request a method that will enable them to readily interrogate their data. After a few discussions with the users in which various alternatives are suggested, IT realizes that the data is not available in a useful format. To quickly satisfy the users' needs, IT pulls the data together and makes it accessible to them in a data mart.

Here is the concern: By creating a data mart, a precedent and process is established. Business users will soon want additional data elements, or similar data with slightly different data elements or at different aggregate levels—or even another data mart.

The volume of data marts soon spirals out of control. What started out as a simple request results in drastic and time-consuming consequences. Management challenges become uncontrollable. Data becomes inconsistent as it moves across the data marts, resulting in queries and generated reports being unaligned. Concerns arise about the data's validity, and IT must devote extra time to either confirm the data is accurate or research and explain its inconsistencies.

Architecture challenges
When researching the data validity issues in this scenario, some in IT suggest a hub-and-spoke architecture as a feasible solution. This type of architecture consists of a centralized data store (the hub) with the data (spokes) radiating out to the business users in the form of a data mart.

But this architecture brings new challenges. Updating the data in the hub, packaging it and sending it down the spokes creates a time hardship. Because of IT's processing requirements, the data is not updated or accessible to the users during critical periods.

In addition, once the IT networking group finds that the data movement negatively affects LAN bandwidth, it limits when data is sent down the spokes. So the window to move the data narrows and users have even less access to updated data.

Data mart costs
Meanwhile, the growing number of data marts is affecting the organization's bottom line. Taking into account prices for tools and licensing fees, reduced number of processes, data reusability, maintenance and infrastructure support, the total cost of ownership (TCO) for an EDW is equivalent to or less than the estimated cost of five data marts.

Add in the costs of time-to-market for new applications, sourcing the data and developing the processes, and the savings is even greater for each additional data mart. More critical is the business users' level of confidence when the data they need is unavailable and inaccessible and when its accuracy and validity are questioned. These factors support the EDW's best practice appeal.

Benefits of going inside the EDW
These are just some of the reasons that a centralized EDW is the optimal solution. By working toward integrating all of the enterprise's data into a centralized data warehouse and allowing users direct access to that platform, issues of timeliness and propagation can be eliminated.

In addition, an EDW's value and use, as well as its benefit to the organization, will increase exponentially as data elements are integrated into it. The more data subjects in an EDW, the greater the number of logical combinations among the data elements. Queries limited to one data subject produce fewer and less intricate results than queries with a greater range of data subjects.

As a simple example, if we create an Orders data mart and a separate Inventory data mart, the questions to ask would be specific to that data subject, such as:
Orders
  • What product orders were placed?
  • What products are on back order?
Inventory
  • What is the quantity of the products in inventory?
  • What are their expiration dates?

If we integrate these data marts, we could ask additional questions that reach beyond these narrow subject areas and expand to the subject areas that intersect and connect them:
Combined Orders and Inventory
  • Which orders can be filled with existing inventory?
  • How many days of inventory are required for each location?
  • How will a large order affect current inventory levels?

As the data warehouse environment grows and expands, the time it takes to develop and deliver new applications will be dramatically shortened. Established processes and procedures can be leveraged, and the data that the EDW already holds for existing applications can be reused as new applications are implemented.

Then, as additional data subjects are integrated into the EDW, query possibilities are increased and application delivery schedules are shortened. All of this leads to savings in time and money by building the application in the EDW rather than creating a data mart from the ground up.

Plan accordingly
Valid reasons for generating external data marts will not diminish the long-term challenges. However, the goal is to develop the data mart infrastructure in such a way that maximum benefit is gained while management challenges are minimized. Organizations will want to ensure their data mart platforms can leverage existing tools, training and practical experience. When possible, data should be loaded from the EDW so that it is reflective of the organization. Finally, when the data is either integrated into the centralized EDW or simply eliminated, these data mart platforms can be redeployed.

Overall, businesses want an architecture that promotes vast growth and provides data—the lifeblood of an organization—in an accessible, flexible and timely format. T

Why the Teradata platform family for extended decision-support opportunities?

Teradata supports a best practices architecture that can satisfy an organization's end-to-end decision-support needs. Each platform in the Teradata family is designed to serve different data warehousing and business intelligence (BI) paradigms that allow for virtual or external data marts:
> Teradata Active Data Warehouse 5550 is an active data warehouse platform that provides up to twice the performance of legacy systems and supports all types of virtual data marts, operational data stores and analytical sandboxes.
> Teradata Data Warehouse Appliance 2550 is an entry-level data warehouse used either by companies just starting out in data warehousing or by companies with an established enterprise data warehouse (EDW) that require a complementary external analytical sandbox.
> Teradata Data Mart Appliance 550 is a departmental data warehouse. This single-application system can be used as a small data mart for temporary or quick data analytics or to test and develop new applications, data types or tools.

The platforms provide complete decision-support life cycle management. While the Teradata 550 is a small-scale platform, the Teradata 2550 can be upgraded to the Teradata 5550—the EDW. In addition, they all support Teradata Database Views, which enables creation of virtual data marts.

The family's standardized architecture enables organizations to leverage existing resources and tools. Applications used in the Teradata 550 and Teradata 2550 can easily migrate into the EDW. This flexibility enables the same data, data models, table structures, views, load jobs and queries to meet short- and long-term data integration requirements.

Furthermore, database and system administrators and application developers can support multiple systems across the organization with standardized BI and extract, transform and load tools without additional training.

Whatever your organization's needs may be, the Teradata platform family provides for them—all with the benefits of the Teradata Database.

—D.S.

Debbie Smith, a senior data warehouse consultant, was a Teradata retail customer for 14 years before joining Teradata in 2000.

Teradata Magazine-September 2008

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.