Managing a distributed database is significantly more complicated than running against a monolithic single location database.
The distributed d.b.a. has all of the design and implementation issues of a single location plus the added complexity of
distribution, network latency, time shifts and remote administration.
The distributed d.b.a. (d.d.b.a.)is a new job function in addition to local d.b.a.'s. The following are examples of work the
d.d.b.a. will perform:
Some ideas to remember as you consider implementing a replicated database environment are:
Today, there are no standards that apply to replication across diverse products. And there are no standards bodies working
on this issue. Issues like utilities and recovery are just handled quite differently in different vendor's products.
All of the major DBMS vendors are moving toward opening up their replication capabilities to foreign DBMS. Digital, Oracle,
Sybase and IBM are focusing their attention on links to each other and other relational DBMS products. IBM, CA, and
Sybase have published their 2-phase commit protocols which allows their users to participate in heterogeneous distributed
database approaches with products from other vendors.
Both Sybase and CA-Ingres have links to non-relational DBMS in their target replication capability. Normally if the vendor
supports a gateway to that DBMS, then it can serve as a target for replication. That includes IMS, RMS, VSAM and other
environments for both of these vendors. The gateways to non-relational DBMS don't require special coding (such as RPC's)
and are valuable in allowing the integration of new distributed systems with older applications.
As a general rule, replication from a foreign DBMS into a replication environment such as CA-Ingres or Sybase is only
available now if the user is willing to program that functionality. One important exception is an IBM offering which allows
replication from IMS into the DB2/DRDA world.
Anyone contemplating the acquisition of replication technology should understand how your vendor will assist in migrating to a
heterogeneous DBMS environment. Almost no organization today uses one DBMS exclusively and heterogeneity in database
and file management approaches is likely to increase in the future. Gateway solutions, of course, are not the same as a
replication and 2-phase commit process that transparently operates over multiple DBMS. The real world is multi-vendor,
multi-department and multi-network. Replication technology that can operate well across heterogeneous DBMS is something
that DBMS users will want.
1. A replication server can be instrumental in allowing more efficient usage of a company's computers and network. By shifting
data to the local site where it's needed, companies can insure that important applications are available at all times. The
response time achievable from local data access can be significantly improved over response that depends on access from a
distance. Also, replication is more fault tolerant than distributed DBMS. That fault tolerance results in more consistent
processing of transactions with the result that the overall database is up and responsive more than the equivalent configuration
would provide if it were a distributed DBMS.
2. Replication can provide the architecture for backup that can enhance your system reliability in a local (and/or WAN) environment. Replication, enhanced with hot-standby software, operates by monitoring the performance health of your primary server, while transactions are backed up on the replication server. When there's a failure on the primary processor the backup is immediately available. The system automatically switches to the backup and designates another machine as the new backup replicate.
3. Individual workgroups can now have their own replicated databases. This means not ever having to say "sorry" for network
propagation delays. Replication can enhance performance and provide load balancing locally or over a WAN. As an example
of this, two replicate servers could allow queries to be channeled to one machine while updates and production work are
channeled to the other. The query server will have accurate information that is exactly current or somewhat dated, depending
on the speed of replication chosen by the user. With DSS-R approaches the database copies can be enhanced for decision
support. Data can also be replicated from legacy applications and made available now to new styles of processing across the
network.
Decision support types of applications are natural replication candidates, because if they're distributed, replication can greatly
reduce WAN traffic.
4. As companies migrate to decentralized operations, they naturally want their computing support to follow the same form. As
the workload is distributed, it is split among multiple servers. There are significant cost savings attached to using multiple
smaller machines to process work. Replication, done intelligently, can reduce network traffic and allow the user to derive
benefit from what would otherwise be unused CPU cycles. Another way to look at this is that replication allows easy local
data access at remote sites. This, then, allows:
a. A decrease in response times
b. A reduction in wide area network traffic
c. The establishment of local autonomy which can take over in case of network or server failure. A key to achieving this
advantage is to use a peer to peer type of replication service. This is so that when recovery occurs the completed local updates
can be properly propagated to other locations of the same data.
5. Replication is an important technique for increasing the availability or uptime of network based computing. Redundancy is
the fundamental engineering approach for increasing reliability and replication can be used exactly for this purpose.
Imagine a retail operation where sales offices are widely distributed and inventory is kept at a few major warehouse locations.
If the warehouse information is replicated at the sales offices, then it's possible for the sales office to accept tentative orders
even if the network link to the local warehouse is broken. The sales office can accomplish all of the processing necessary for a
sale except for a final confirmation without access to the central source inventory data.
This kind of capability provides for a higher level of customer service than what could be provided by a system operating off a
single central database with communication links to the distributed sales offices. For a distributed operation, then, replication of
both TP-R and DSS-R types allows for higher system availability than a monolithic model.
The benefits of a properly implemented replication scheme can be very substantial. The complexity however, in both a
managerial and technical sense, of a distributed environment is much greater than that of a local monolithic environment. This is
especially true for TP-R environments. Data collisions may occur with peer to peer approaches; the recovery process that this
implies requires the cooperation of excellent software and competent administration.
It's wise to invest the necessary resources to make sure that the combination of local and global d.b.a. resources is adequate
for your environment. Your d.b.a. will have to create a data base design that is correct for replication and tested in the
distributed environment. In an operational sense it's important to not shortchange the time it takes for your d.b.a. to become an
expert in diagnosing and resolving problems in this environment. You should seriously consider consultant assistance, probably
from your DBMS vendor, as part of the first project.
Make sure that you understand the architectural, currency, data integrity, and performance implications of a DSS-R or TP-R
based approaches. Different approaches from within any one vendor's product line and/or between vendors mean that
different technologies have very different cost, performance and integrity results. You should have a DBMS that supports the
different requirements of your application environment.
Managing distributed data through replication and copy approaches is non-trivial and will require competent technical
management. Even evaluating the different currently available technologies will require an analyst of top caliber.
Because implementing distributed systems offers so many combinations of technology and benefit you'll need to do some
careful management analysis to understand how these approaches can support your business requirements. Those business
benefits should be measured against the costs of the software and management necessary.
It's wise to begin implementing a distributed database with a single vendor. However, If you have a heterogeneous DBMS
environment, be sure to understand how your vendor can support a multiple DBMS approach.
George Schussel has been a CIO, consultant, industry analyst, writer and lecturer on computer topics for 30 years. His
lectures are held before more than 20,000 professionals a year. He is the founder and Chairman of Digital Consulting, Inc.
(DCI) in Andover, Massachusetts and Chairman of the Database & Client/Server World trade show. He has published over
50 technical and analytical articles and his latest book, Rightsizing Information Systems, co-authored with Steve Guengerich,
was published by the SAMS Publishing Division of MacMillan.
Reach him at 74407.2472@compuserve.com or http://www.dciexpo.com/
©Copyright 1996 by Digital Consulting, Inc.
All Event names are trademarks of DCI or their clients.
Comments? webmaster@dciexpo.com