Publication Date: July 29, 1996
Building Success Into Your Data Warehouse Project
By Anita J. Freed
A data warehouse can be a powerful business tool,
putting mountains of corporate information at
workers fingertips. But fail to build your data
warehouse on the foundation of sound management
principles and it will fail to live up to its
promise, says Larry English, president and principal
of Information Impact International, Inc.
"This is not just another project. It is
actually taking the corporate information and pulling
it together in such a way that it can be analyzed and
used as an enterprise knowledge resource. We have to
apply good organizational principals in
planning; organizing and staffing; directing; and
controlling," says English, an internationally
recognized speaker and consultant on information
management topics.
What that means is that upper managements
support and involvement is crucial to any data
warehouse project. Other critical success factors, as
identified by English, include the following:
- The data warehouse must solve a real business
problem.
- It must be an enterprise solution.
- The data must be complete, accurate, and
consistently defined.
- The project must use the minimal set of
technology components.
- Data must be managed at the detail level.
- An information directory must be available to
all knowledge workers.
- Definitions established for the data
warehouse must carry over into operational
data.
These factors can be summarized in three
categories: Make It Useful, Keep It Clean and Keep It
Simple.
Make It Useful
The data warehouse must be designed as an
enterprise solution not a departmental
solution to a clearly defined business
problem. Such problems tend to fall into broad
categories: customer relations; product or service
quality; productivity issues; or time-to-market
issues. The goal, English says, is to provide
information "horizontally, across departments.
In a vertical structure, you exacerbate the problem
of being unable to integrate the information
resources company-wide." Plus its costly
in time and monetary terms to later
integrate separately developed departmental systems.
This is not to say a data warehouse cannot be used to
solve the problems of a single business unit; rather,
that the data architecture must take into account the
broader uses of the information, so the warehouse
will have broader value. For example, suppose you are
a retailer whose company also has wholesale and
direct marketing divisions. When the company develops
the "customer subject area" of its data
warehouse, English says, it needs to identify what is
in common among customers in the three divisions.
When you use the warehouse to solve a particular
problem say, how to increase the lifetime
value of the retail customer you may extract
only retail data, but it won't change the
architecture of the database.
This underscores the need for the database to have
consistent, consensus definitions of all areas of
interest for all parties who may someday want that
information. It also points to the need to identify
the authoritative sources for the warehouse data: A
company with multiple customer groups, for example,
is likely to have a certain redundancy, and
inconsistency, among its records. Choosing an
authoritative information source, and clearly
identifying which attributes define the data,
eliminates those inconsistencies. "The data
warehouse will become the primary source of
information about a given topic customer,
product, order history if it's done
right," English says.
Keep It Clean
"The majority of the effort that goes in a data
warehousing project is loading clean data,"
English says. And no wonder. Ultimately, the success
of a data warehouse rests on the accuracy and
completeness of the information in it. Bad data
generates two results: Workers will discover that the
information they extract is wrong, and they will stop
using the warehouse; or they will extract the data,
believe that it is correct, and make bad decisions
because of it. In either case, the warehouse is a
failure.
A number of tools exist for transforming data to
warehouse requirements, but some data just cant
be cleaned: Records are lost; information may be
irretrievable. In those cases, its important to
identify the accuracy level for that part of the
warehouse, so workers know ahead of time the quality
of the data they are extracting.
Furthermore, English says, "not a single data
cleansing product can guarantee the accuracy of the
data." A tool to check gender codes, for
example, will recognize that a "G" is not
an "M" or "F," but it wont
be able to tell you whether "Pat" is male
or female. Therefore, management must foster what
English calls "information stewardship."
"You must make people believe: 'I'm accountable
for what I do because other people count on
it.'" Promoting that attitude could entail
everything from better employee training programs to
rewarding data entry personnel for accuracy rather
than just speed. "It's a cultural change and
it's another reason for senior management's
involvement because the data warehousing team
can't do it," English says.
Keep It Simple
English advocates using a "minimalist set of
technology components," or making the warehouse
architecture as simple and intuitive as possible.
Among his suggestions: Use a single directory for
metadata; standardize the database management system;
and match terms used in the data warehouse to
standard business terms, so its easy to relate
different stores of information. The easier it is to
use the system, the more likely it will continue to
be used and supported.
He also recommends maintaining the base data from
which summary calculations are derived, as part of a
backup system. (Not doing so can spell disaster if
your summary data is lost and you need to regenerate
the calculations.) Also, be sure to make an
informational directory available to all workers so
they know the meaning and source of data in the
warehouse.
There are two schools of thought on whether to return
cleaned-up data to a companys operational
database. While some believe the operational data
should not be changed, "I disagree with
that," English says. "I believe it is a
critical success factor that once you clean data,
that you send that data back into the operational
system if that data is still being used," as a
way of minimizing possible discrepancies between the
two systems.
In the same vein, English recommends applying the
consensus definitions used for the data warehouse to
the operational data. "When any application
comes up for re-employment, we have already defined
these attributes. [Over time] this eliminates the
need for a transform tool. Its one of those
side benefits," says English. "Capture
once, use many times. Its the principle of
value."
Anita J.
Freed is an Internet project manager at
DCI.
Larry English is a featured speaker at DCI's Data Warehouse World.