Publication Date: November 8, 1996
Applying a Filter to the Information Stream
By Sue Mellen
It has been said that the Internet is a huge
parking lot where someone has dumped all of the books
from all of the libraries in the world. How do you
find the information you need in the middle of those
stacks without squandering time? Some companies are
turning to news and information filtering products
that turn all that data into neat little packages of
information.
It only makes sense that, in this age of
"info-glut," information filtering has
become big business, with the leaders in the industry
relying on a combination of database technology and
human expertise. Two companies that have built their
reputations in the area are OneSource Information
Services of Cambridge, Mass., and Individual, Inc.
of Burlington, Mass.
From the Lotus Position
An independent company since 1993, OneSource
claims more than 1,000 corporate customers, each with
multiple users. The company first saw light as Lotus
One Source, born out of Lotus Development
Corp.s 1987 acquisitions of two early
innovators in CD-ROM information storage: Datatext
Inc., which was dedicated to gathering and
compressing text-related information, and Isis Corp.,
where the focus was on harvesting and organizing
numeric data.
Dan Schimmel, OneSource president and CEO, says
his companys family tree provides a competitive
edge in the number-crunching '90s. Thanks to the
marriage of two such disparate entities as Isis and
Datatexta highly unusual move at the
timethe company has long experience in
integrating text and numerical data from many
different sources, he says.
"A OneSource user has the numbers along with
text-related analysis, often from an entirely
different source, to help him understand what the
figures mean," Schimmel says.
Information reaches the company through a series
of licensing agreements with "brand-name
providers," according to vice president Jimmy
Becker. Data sources include Moodys Investors
Services; Dow Jones & Co., Inc.; Newsline; and
Prompt., a database of news abstracts on new
products, acquisitions, users, technology and
business ventures.
OneSource uses a filtering system it calls Master
Entity Vocabulary (MEV) that has at its core an
Oracle database of 150,000-plus company names. When
information comes into the system, it is first
converted to a common format using a program called a
"loader," then matched against the database
of corporate entities to determine relevancy. At this
point, the companys three database drivers come
into play. These consist of a specialized numeric
engine capable of pulling figures from articles and
reports; a full-text engine that indexes text; and a
real-time news processor that takes constant news
feeds from providers and matches them with
appropriate corporate entities.
Human intervention also plays a part in the MEV
process, with five full-time editors and a number of
consultants bolstering the technical component of the
system. Editors check incoming data for new companies
or changes in the way data providers identify
companies, then match them against the MEV to be sure
the system recognizes the revised data. In the case
of new information sources, human editors wait to see
how many entities their computerized colleague will
recognize, manually matching any orphans left in the
data.
The biggest news at OneSource is its addition of a
Web-based delivery system. On Oct. 15 the company
announced deployment of OneSource.com, a
line of products employing Internet tools to function
on corporate intranets. Initially, the company is
offering two Web-ready commodities: Account Manager
for sales professionals and Business Browser directed
toward corporate researchers. Two other
productsInsurance Analyst for the data-hungry
insurance industry, and UK Business Browserare
scheduled for launch by the end of the year.
The company continues to offer its products in
other formats including CD-ROMhearkening back
to its roots as a composite of the two CD-ROM
pioneersand Lotus Notes. But Becker says the
Web format offers significant advantages, including
timeliness and portability. "CD- ROMs just
dont work very well when you're out on the road
making sales calls," he says. But he adds that
the CD-ROM format is still the format of choice for
some users. "Were absolutely committed to
the CD-ROM platform. Some people need the in-depth,
custom reporting CDs allow."
A SMART Use of Technology
Individual Inc., founded in 1989, claims 280,000
readers worldwide. The company already has a
significant presence on the Web in NewsPage, an
online information service boasting more than 25,000
pages of news related to various topics and
industries. The service gets more than four million
hits a week and feeds information to more than
200,000 users. Two other Individual products, First!
and First! Alert, send to a users system
packages of breaking news on pre-selected topics or
companies. First! subscribers get information by 8
every morning, with First! Alert customers getting
bulletins throughout the day.
Individual employs a proprietary sorting system it
calls SMART (System for Manipulation and Retrieval of
Text) technology, developed by the late Dr. Gerard
Salton of Cornell University. The system has three
key components used to filter incoming data: a
thesaurus, the core SMART engine, and the Post
Processor; with a staff of 30-plus editorial managers
or "domain experts" overseeing the entire
process.
"Weve hired experts in
telecommunications, information technology,
aerospace, health care, energy, finance, and
automotive, just to name a few keys industries,"
says Richard C. Vancil, Individuals vice
president of marketing. He explains that the domain
experts manage customer profiles with an expert in
health care, for example, making sure that hospitals
and physicians practices have the right recipe of
news and information.
Individual gets daily information feeds via leased
telephone lines, satellite dish reception and dial-up
modem, then formats each story into the universal
format required by the core SMART filtering engine.
After a built-in Story Editor eliminates long-winded
or error-filled articles, information goes on to the
systems thesaurus, which adds semantic
equivalents of important words. It is also designed
to recognize critical words that may be used
infrequently in a story.
"If, for instance, a user is interested in
local area networks, hell get information about
LANs. Using traditional Boolean search methods, the
word or phrase in the query would actually have to
appear in the article," says Vancil.
The core SMART technology basically assigns values
to text, creating algorithms based on the frequency,
placement and relative importance of terms. The
system uses a similar process to assign values to a
users query, and selects stories that have
values falling within a pre-determined distance from
those in a users query.
Whenever a story falls within the specified
distance, it is picked out and passed on to the Post
Processor to make sure it really fits the customer's
area of interest. At that point, an editor can
intervene to either shrink or increase the distance
based on a customer profile, so that more or fewer
stories are passed on for final filtering. This
closing process employs pseudo-Boolean (the company
calls it "fuzzy Boolean") technology to
cull any mis-hits.
Finally, the information is delivered in a variety
of formats including fax, e-mail, intranet or
enterprise-wide feed for groupware platforms such as
Lotus Notes. An acquisition in June 1996 added FreeLoader, Inc.
to Individual's mix. FreeLoader offers an off-line
browsing application that enables users to retrieve
and store Web pages on their hard disk for later
viewing (see
related story).
Growing Market
The business world's continuing demand for
information is certain to promote growth in companies
like OneSource and Individual. According to the
research company SIMBA Information, Inc., business
information earnings reached $25.4 billion in 1994.
And according to IDC/Link Resources, news filtering
services represent a growing segment of that market,
with earnings expected to hit $85 million in 1996 and
$185 million by 1999.
Sue Mellen writes from Tyngsboro, Mass.
DCI's Database & Client/Server World
focuses on a wide variety of database applications
and issues. Please see our latest on-line
brochure for conference, exposition and
registration information.
Related articles - Keeping
an Electronic Eye on Business and 10
Web Sites for Business-Related Information