Publication Date: July 29, 1996
The Metadata Challenge
By Anita J. Freed
Youre driving at night in an unfamiliar area
with no headlights on. Having a tough time staying on
the road?
Operating a data warehouse without making use of
metadata is analogous to driving in the dark, says
Sanders Partee, a director of product development for
PLATINUM technology,
Inc.: It can be done, but only with great
difficulty.
Simply put, metadata is data about data, or data
about the systems that operate around your data. A
repository is a system for collecting, maintaining
and communicating data. Neither term applies strictly
to data warehousing, but a metadata repository can be
an important tool in managing warehouse data and
converting it into useful business information.
Among other things, a well-constructed metadata
repository can tell you where your data came from,
what transformation rules were applied to it, and
what naming and grouping conventions were used on it.
"It's not really the data itself. It's not the
sales information or the order information or any of
the operational data. It's really 'value-added'
metadata. It's more than just descriptions of columns
and tables it's organizing information,"
says Partee.
No company's computer system is void of metadata.
It's created whenever you define ways to move data,
for example; and many vendors include metadata stores
or catalogs with their products. But the need to
manage that data "can be glossed over unless you
have gone through the pain of building a data
warehouse without getting a handle on the
metadata," says Jack Sweeney, president and CEO
of Intellidex,
based in Winthrop, Mass. Mapping out a plan to
capture and use that information from the start can
save you headaches down the road.
Creating a Metadata Repository
Partee divides the strategies for creating and
maintaining a metadata repository into two broad
categories: population and maintenance tools; and
editor tools.
1. Population and maintenance tools hunt for and
capture information from other applications, and they
are used in conjunction with software designed to
parse this information. As might be expected,
population tools capture metadata for the initial
development of the repository, whereas maintenance
tools periodically update the repository as changes
occur. "Populating one time is not enough
you must maintain the repository for it to be
useful," Partee says.
2. Editor tools make it easier for workers to build
topic areas, or categories, of metadata, by helping
those workers annotate and qualify the information
that is put in the repository. Those topic areas, in
turn, make it easier to access the information.
Putting Metadata To Use
"Getting data in is only half of the
equation," notes Sweeney. To put your repository
to use, you need a method for accessing that
information. Three strategies for doing that are as
follows.
1. Metadata browsers be they desktop
applications or Internet/intranet-related tools
provide "overt" access to the
information. When you are using the data warehouse,
you take advantage of the metadata in some direct
way, such as when you make an ad hoc query.
2. "Implicit" tools are query tools that
use knowledge of the metadata as part of their
inherent structure
3. Or, you can export the metadata to the native
format of your query tools. This method can let you
take advantage of earlier queries made.
Success Factors and Pitfalls
If the metadata issue is in the spotlight now, it's
because more companies are exploring the potential of
data warehousing, and more companies are learning
from the mistakes of warehousing pioneers. "As
more enterprises build warehouses, they've come to
realize that the warehouses become cumbersome, and
that they have to get a handle on the metadata,"
says Sweeney, formerly director of information
resources at Bank of Boston and an early user of data
warehousing.
But even with that growing body of experience to tap
into, companies face plenty of hurdles in
establishing effective data warehouses
infrastructure issues being chief among them.
"You have to make a commitment to build a data
warehouse. You have to have an extract mechanism. You
have to have a data warehouse itself a
relational database that the data sits on top of. You
have to have query tools.
There is a
commitment of time and money and people, even if it's
an elementary infrastructure," says Sweeney.
"It doesnt have to be intergalactic, but
it can't be done in a matter of three weeks
either."
Says Partee: "You have to have a strategy and
the capability to execute that strategy, to populate,
maintain and use a repository.
If you have
great metadata and it's on a book on someone's shelf,
or it's in Joe's head, or it's with Sue who has just
left the company, then it's not very useful."
Partee estimates that companies embracing repository
solutions spend 80 percent of their energy dealing
with how they are going to put data into a
repository, and only 20 percent thinking about how
they are going to maintain and use it. He counsels
against that. "A forward-thinking and successful
shop is going to do it a bit more evenly.
The
people who are most successful with repository pay
attention to, and spend more of their energy on, how
they are going to use it."
Market Changes
From the vendor perspective, "the market is
ready to go from the early adopter phase to a more
mainstream application," says Partee, whose
division at PLATINUM specializes in repository
solutions. "The recognition of metadata as
valuable [to business] has become very widespread
over the last three to four years. On the other hand,
instead of everyone moving to repository as a
solution, lots of vendors are offering metadata
stores within their own tools. This leads to
disparate metadata stores and not one large
repository."
For a company that uses only one vendor's products,
this may not be a problem, Partee says, because the
various metadata stores should work together. But
when a company uses products from multiple vendors,
integrating information becomes more difficult.
That's when a repository is highly valuable.
"There is a lot of hype out there," says
Sweeney, whose company, Intellidex, markets a
metadata catalog product. "The benefits of data
warehousing are real but the business folks want
something right away. And the expectation that gets
set the quick and dirty solution can
lead to disappointment. But if you can get your hands
on the corporate information, the benefits are
tremendous."
Anita J.
Freed is an Internet project manager at
DCI.
DCIs
Data Warehouse World features presentations on
metadata topics by Jack Sweeney and PLATINUM technology
representatives.