web hit counter DCI: Anita Freed - The Metadata Challenge
 
 

Publication Date: July 29, 1996

The Metadata Challenge

By Anita J. Freed

You’re driving at night in an unfamiliar area with no headlights on. Having a tough time staying on the road?

Operating a data warehouse without making use of metadata is analogous to driving in the dark, says Sanders Partee, a director of product development for PLATINUM technology, Inc.: It can be done, but only with great difficulty.

Simply put, metadata is data about data, or data about the systems that operate around your data. A repository is a system for collecting, maintaining and communicating data. Neither term applies strictly to data warehousing, but a metadata repository can be an important tool in managing warehouse data and converting it into useful business information.

Among other things, a well-constructed metadata repository can tell you where your data came from, what transformation rules were applied to it, and what naming and grouping conventions were used on it. "It's not really the data itself. It's not the sales information or the order information or any of the operational data. It's really 'value-added' metadata. It's more than just descriptions of columns and tables – it's organizing information," says Partee.

No company's computer system is void of metadata. It's created whenever you define ways to move data, for example; and many vendors include metadata stores or catalogs with their products. But the need to manage that data "can be glossed over unless you have gone through the pain of building a data warehouse without getting a handle on the metadata," says Jack Sweeney, president and CEO of Intellidex, based in Winthrop, Mass. Mapping out a plan to capture and use that information from the start can save you headaches down the road.

Creating a Metadata Repository

Partee divides the strategies for creating and maintaining a metadata repository into two broad categories: population and maintenance tools; and editor tools.

1. Population and maintenance tools hunt for and capture information from other applications, and they are used in conjunction with software designed to parse this information. As might be expected, population tools capture metadata for the initial development of the repository, whereas maintenance tools periodically update the repository as changes occur. "Populating one time is not enough – you must maintain the repository for it to be useful," Partee says.

2. Editor tools make it easier for workers to build topic areas, or categories, of metadata, by helping those workers annotate and qualify the information that is put in the repository. Those topic areas, in turn, make it easier to access the information.

Putting Metadata To Use

"Getting data in is only half of the equation," notes Sweeney. To put your repository to use, you need a method for accessing that information. Three strategies for doing that are as follows.

1. Metadata browsers – be they desktop applications or Internet/intranet-related tools – provide "overt" access to the information. When you are using the data warehouse, you take advantage of the metadata in some direct way, such as when you make an ad hoc query.

2. "Implicit" tools are query tools that use knowledge of the metadata as part of their inherent structure

3. Or, you can export the metadata to the native format of your query tools. This method can let you take advantage of earlier queries made.

Success Factors and Pitfalls

If the metadata issue is in the spotlight now, it's because more companies are exploring the potential of data warehousing, and more companies are learning from the mistakes of warehousing pioneers. "As more enterprises build warehouses, they've come to realize that the warehouses become cumbersome, and that they have to get a handle on the metadata," says Sweeney, formerly director of information resources at Bank of Boston and an early user of data warehousing.

But even with that growing body of experience to tap into, companies face plenty of hurdles in establishing effective data warehouses – infrastructure issues being chief among them. "You have to make a commitment to build a data warehouse. You have to have an extract mechanism. You have to have a data warehouse itself – a relational database that the data sits on top of. You have to have query tools. … There is a commitment of time and money and people, even if it's an elementary infrastructure," says Sweeney. "It doesn’t have to be intergalactic, but it can't be done in a matter of three weeks either."

Says Partee: "You have to have a strategy and the capability to execute that strategy, to populate, maintain and use a repository. …If you have great metadata and it's on a book on someone's shelf, or it's in Joe's head, or it's with Sue who has just left the company, then it's not very useful."

Partee estimates that companies embracing repository solutions spend 80 percent of their energy dealing with how they are going to put data into a repository, and only 20 percent thinking about how they are going to maintain and use it. He counsels against that. "A forward-thinking and successful shop is going to do it a bit more evenly. … The people who are most successful with repository pay attention to, and spend more of their energy on, how they are going to use it."

Market Changes

From the vendor perspective, "the market is ready to go from the early adopter phase to a more mainstream application," says Partee, whose division at PLATINUM specializes in repository solutions. "The recognition of metadata as valuable [to business] has become very widespread over the last three to four years. On the other hand, instead of everyone moving to repository as a solution, lots of vendors are offering metadata stores within their own tools. This leads to disparate metadata stores and not one large repository."

For a company that uses only one vendor's products, this may not be a problem, Partee says, because the various metadata stores should work together. But when a company uses products from multiple vendors, integrating information becomes more difficult. That's when a repository is highly valuable.

"There is a lot of hype out there," says Sweeney, whose company, Intellidex, markets a metadata catalog product. "The benefits of data warehousing are real but the business folks want something right away. And the expectation that gets set – the quick and dirty solution – can lead to disappointment. But if you can get your hands on the corporate information, the benefits are tremendous."

Anita J. Freed is an Internet project manager at DCI.


DCI’s Data Warehouse World features presentations on metadata topics by Jack Sweeney and PLATINUM technology representatives.


 
[Home] [Events] [Find It] [Sign Up] [IT News] [Support] [What's New] [Brochures]
©Copyright 1997 by Digital Consulting, Inc. (508) 470-3880
All event names are trademarks of DCI or its clients.
Comments?
webmaster@dciexpo.com