Managing a distributed database is significantly more complicated than running against a monolithic single location database. The distributed d.b.a. has all of the design and implementation issues of a single location plus the added complexity of distribution, network latency, time shifts and remote administration.
The distributed d.b.a. (d.d.b.a.)is a new job function in addition to local d.b.a.'s. The following are examples of work the d.d.b.a. will perform:
- Designing and planning the replication system, including how and when data is shared amongst users. It's only after this work has been done that the local d.b.a. can input the necessary information to set up the replication system.
- Coordinating the installation and system configuration amongst its various sites.
- Monitoring the operation, performance and recovery of the system from an enterprise, rather than a local, perspective.
Some ideas to remember as you consider implementing a replicated database environment are:
- I. Set up a plan and understand the rules for distribution of data before the implementation begins. Implementing replicated databases is not technology amenable to "let's try it and push it around a bit" approaches. It's necessary to have a good plan in hand before you begin or you will get lost in the middle of building the replicated environment. If your plan is good, the implementation can proceed in incremental fashion, however.
- II. Make sure that your d.d.b.a. has good forms based or graphical utilities to assist in the database configuration and in the management of the ongoing network. For example, CA-Ingres comes with forms based management utilities and IBM and Sybase have GUI based management utilities. These facilities should be able to manage all aspects of a replication environment from a single desktop that's moveable and can be anywhere on the network. Some points to carefully consider:
- A. How do you specify enhancements to the data? Do you have to learn a new language for this function?
- B. How is the replication setup handled? How much automated support is provided to the d.d.b.a.?
- C. What is the support provided for failure management? How much recovery is automatically handled and how much d.b.a. intervention is required?
- III. Your utilities should be able to answer questions like:
- What tables are at what nodes?
- What columns are at what locations?
- What rows are at what locations?
- Where are transactions routed to?
- IV. You should be able to change the database configuration on the fly without bringing the database or replication operation to a standstill.
- There should be a mail based error notification system. This allows management of the distributed enterprise from any node on the network.
Today, there are no standards that apply to replication across diverse products. And there are no standards bodies working on this issue. Issues like utilities and recovery are just handled quite differently in different vendor's products.
All of the major DBMS vendors are moving toward opening up their replication capabilities to foreign DBMS. Digital, Oracle, Sybase and IBM are focusing their attention on links to each other and other relational DBMS products. IBM, CA, and Sybase have published their 2-phase commit protocols which allows their users to participate in heterogeneous distributed database approaches with products from other vendors.
Both Sybase and CA-Ingres have links to non-relational DBMS in their target replication capability. Normally if the vendor supports a gateway to that DBMS, then it can serve as a target for replication. That includes IMS, RMS, VSAM and other environments for both of these vendors. The gateways to non-relational DBMS don't require special coding (such as RPC's) and are valuable in allowing the integration of new distributed systems with older applications.
As a general rule, replication from a foreign DBMS into a replication environment such as CA-Ingres or Sybase is only available now if the user is willing to program that functionality. One important exception is an IBM offering which allows replication from IMS into the DB2/DRDA world.
Anyone contemplating the acquisition of replication technology should understand how your vendor will assist in migrating to a heterogeneous DBMS environment. Almost no organization today uses one DBMS exclusively and heterogeneity in database and file management approaches is likely to increase in the future. Gateway solutions, of course, are not the same as a replication and 2-phase commit process that transparently operates over multiple DBMS. The real world is multi-vendor, multi-department and multi-network. Replication technology that can operate well across heterogeneous DBMS is something that DBMS users will want.
1. A replication server can be instrumental in allowing more efficient usage of a company's computers and network. By shifting data to the local site where it's needed, companies can insure that important applications are available at all times. The response time achievable from local data access can be significantly improved over response that depends on access from a distance. Also, replication is more fault tolerant than distributed DBMS. That fault tolerance results in more consistent processing of transactions with the result that the overall database is up and responsive more than the equivalent configuration would provide if it were a distributed DBMS.
2. Replication can provide the architecture for backup that can enhance your system reliability in a local (and/or WAN) environment. Replication, enhanced with hot-standby software, operates by monitoring the performance health of your primary server, while transactions are backed up on the replication server. When there's a failure on the primary processor the backup is immediately available. The system automatically switches to the backup and designates another machine as the new backup replicate.
3. Individual workgroups can now have their own replicated databases. This means not ever having to say "sorry" for network propagation delays. Replication can enhance performance and provide load balancing locally or over a WAN. As an example of this, two replicate servers could allow queries to be channeled to one machine while updates and production work are channeled to the other. The query server will have accurate information that is exactly current or somewhat dated, depending on the speed of replication chosen by the user. With DSS-R approaches the database copies can be enhanced for decision support. Data can also be replicated from legacy applications and made available now to new styles of processing across the network.
Decision support types of applications are natural replication candidates, because if they're distributed, replication can greatly reduce WAN traffic.
4. As companies migrate to decentralized operations, they naturally want their computing support to follow the same form. As the workload is distributed, it is split among multiple servers. There are significant cost savings attached to using multiple smaller machines to process work. Replication, done intelligently, can reduce network traffic and allow the user to derive benefit from what would otherwise be unused CPU cycles. Another way to look at this is that replication allows easy local data access at remote sites. This, then, allows:
a. A decrease in response times
b. A reduction in wide area network traffic
c. The establishment of local autonomy which can take over in case of network or server failure. A key to achieving this advantage is to use a peer to peer type of replication service. This is so that when recovery occurs the completed local updates can be properly propagated to other locations of the same data.
5. Replication is an important technique for increasing the availability or uptime of network based computing. Redundancy is the fundamental engineering approach for increasing reliability and replication can be used exactly for this purpose.
Imagine a retail operation where sales offices are widely distributed and inventory is kept at a few major warehouse locations. If the warehouse information is replicated at the sales offices, then it's possible for the sales office to accept tentative orders even if the network link to the local warehouse is broken. The sales office can accomplish all of the processing necessary for a sale except for a final confirmation without access to the central source inventory data.
This kind of capability provides for a higher level of customer service than what could be provided by a system operating off a single central database with communication links to the distributed sales offices. For a distributed operation, then, replication of both TP-R and DSS-R types allows for higher system availability than a monolithic model.
The benefits of a properly implemented replication scheme can be very substantial. The complexity however, in both a managerial and technical sense, of a distributed environment is much greater than that of a local monolithic environment. This is especially true for TP-R environments. Data collisions may occur with peer to peer approaches; the recovery process that this implies requires the cooperation of excellent software and competent administration.
It's wise to invest the necessary resources to make sure that the combination of local and global d.b.a. resources is adequate for your environment. Your d.b.a. will have to create a data base design that is correct for replication and tested in the distributed environment. In an operational sense it's important to not shortchange the time it takes for your d.b.a. to become an expert in diagnosing and resolving problems in this environment. You should seriously consider consultant assistance, probably from your DBMS vendor, as part of the first project.
Make sure that you understand the architectural, currency, data integrity, and performance implications of a DSS-R or TP-R based approaches. Different approaches from within any one vendor's product line and/or between vendors mean that different technologies have very different cost, performance and integrity results. You should have a DBMS that supports the different requirements of your application environment.
Managing distributed data through replication and copy approaches is non-trivial and will require competent technical management. Even evaluating the different currently available technologies will require an analyst of top caliber.
Because implementing distributed systems offers so many combinations of technology and benefit you'll need to do some careful management analysis to understand how these approaches can support your business requirements. Those business benefits should be measured against the costs of the software and management necessary.
It's wise to begin implementing a distributed database with a single vendor. However, If you have a heterogeneous DBMS environment, be sure to understand how your vendor can support a multiple DBMS approach.
George Schussel
George Schussel has been a CIO, consultant, industry analyst, writer and lecturer on computer topics for 30 years. His lectures are held before more than 20,000 professionals a year. He is the founder and Chairman of Digital Consulting, Inc. (DCI) in Andover, Massachusetts and Chairman of the Database & Client/Server World trade show. He has published over 50 technical and analytical articles and his latest book, Rightsizing Information Systems, co-authored with Steve Guengerich, was published by the SAMS Publishing Division of MacMillan.
Reach him at
74407.2472@compuserve.com or http://www.dciexpo.com/
George Schussel is featured at DCI's Database & Client/Server World, click here to see the full conference program.
Back to DCI's Speaker-Features
©Copyright 1996 by
Digital Consulting, Inc.
All Event names are trademarks of DCI or their clients.
Comments?
webmaster@dciexpo.com