Rules for Distributed System
About Info|Ed ] TiAC ] What's new at Info|Ed and TiAC ] TiAC White Paper Repository ] Information Architecure Resources ] White Paper Request ]

 

Rules for Distributed Systems

By Jon Blunt

© 1996

We have a long history of developing robust systems for managing resources of limited availability, such as inventories. The classic model is transaction processing where many workstations can originate transactions that are processed on a single machine against a unique database.

New approaches to application design, such as client server, can make use of the distributed power of workstations and workgroup servers to process in parallel. This greatly raises the ability to handle complex and rich data types without creating excessive network and channel traffic. However, the need to ensure accuracy in the inventory file creates a processing bottleneck. This is even more complex where the availability of several resources have to be checked and updated for the transaction to be processed. For example, in placing an order a system may have to check the availability of the items, availability of space on the truck, and the customer's credit position. If the order is accepted, all of these pieces of information have to be updated.

When a system is distributed there are more problems to deal with. For example, any global bank has to manage the problem of setting trading limits in each of its major foreign exchange dealing rooms. The bank's risk management policy will set limits for exposures by currency, by industry, by customer, by counterparty, and by contract length. Without sophisticated systems banks have to allocate these resources in advance. The result is that the bank turns down business, even though there are unused credit limits in other buckets.

The solution adopted by the largest banks is to build a single global limits system that all dealing rooms use. These systems eliminate the misallocation of resources, but they are very expensive to build. They have in effect forced a set of distributed entities to be absolutely synchronized through a centralized system.

Short of this approach, we have limited tools to manage the complexity of distributed systems. Two-phase commit ensures that a transaction will be processed against a number of distributed databases or rolled back on all of them. However, as the number of databases increases the impact on throughput rises rapidly. Two-phase commit only appears to be valid for a small number of databases. Database replicators handle distributing copies of a database from a single master copy to distributed servers. However, they do not solve the update consistency problem, and there is a window between the master being updated and all the servers having the same copy.

The problems with asynchronous updates can be summarized as follows.

1) Lost transactions

Process A reads data to update it. Before the update is made Process B takes a copy of the data. A now updates it and B overwrites it. Process A has been lost.

Airline reservation systems have this property. They regularly lose transactions. A traveler can arrive at an airport with a valid ticket for which there is no record in the system. In most systems this error is considered very serious and prevented through such mechanisms as record locking.

2) Promises broken

Process A looks at a record to check availability of part X. Based on the value, Process A makes a commitment to a delivery date. When that is accepted an order is cut to initiate delivery. In the meantime Process B has looked at that same record and has also committed the part. That commitment is also accepted and another order raised. Both orders cannot be fulfilled, as one will be processed and the other will fail. The rules for resolving this may be first processed, by date stamp of the order transactions, or by intervention by an individual.

This type of error is also common in airline reservation systems. Usually one airline holds details of all the other carrier flights, including a flag saying whether seats are available for sale at each fare. If the available for sale flag is set the selling airline can sell up to an agreed number of seats, usually five, and forward the order to the other airline. If several orders happen simultaneously the flight can be over sold before the carrying airline can broadcast a change of status to the other carriers. [This is different than the practice of deliberately overbooking the plane because some people holding reservations will not turn up.] To avoid this problem airlines are now building the capability of shipping reservation transactions between systems so all requests will be processed against the home database.

3) Opportunities for fraud

If it takes time for a system to respond to an event it is possible to exploit that information gap. The movie, The Sting, demonstrates this when the con men pretend to intercept the race results in the Western Union office before they get to the betting shop.

All these problems are serious. A direct result of losing transactions is to make the system unauditable. The airline reservation systems achieve a high throughput but sacrifice all controls. Bank of America spent a small fortune in the 1980's trying to build banking systems on these platforms but it proved impossible to match the bank's need for controls with the system architecture.

Where a promise has been made and not kept there are many costs, but often these can be estimated and capped. While the airlines could rebuild their systems to eliminate all accidental overbooking, the cost would be high and it is easier to buy off the aggrieved passengers with cash refunds and free tickets.

Given the prevalence of store and forward and mail based systems we need a theory of distributed systems that minimizes these risks and exploits the strengths of decoupled systems. Indeed it is reasonable to say that the real system that is being modeled often has these characteristics. The following is a first step in trying to set some principles for the design of these systems.

Assumptions

  1. All distributed systems are naturally asynchronous. There is always a cost to forcing synchronization.
  2. Events occur at many locations. They can either be processed using local data and resources or an access can be made to a central system. If the central system is not available fallback procedures have to decide whether to cease processing or to accept the transaction and post it later. Similarly, if distributed files are used updates can be made to replicated copies synchronously or asynchronously. If they are made asynchronously then some copies may not be in sync when a future transaction occurs. In all cases, imposing synchronicity has a cost both in system design and complexity and in operational inflexibility.
  3. The real world is represented by a set of events that are not linearizable.
  4. Operating systems use techniques such as semaphores and critical code to ensure that only one process is updating a non-sharable resource. These locks impose delays on other processes and reduce processing to a linear stream of actions which can be reconstructed from a journal tape.
  5. The real world is not so clean. In practice, conflicting events can occur with very little coordination and often the parties are not aware of the problem. When two travel agents are trying to book clients on the same flight, who gets the last seat? Is it the one who came into the shop first, or the one who checked availability first? There is no rule that places any priority or sequence on this, and usually all an individual knows is that there was no seat available.
  6. This is a general problem that is inherent in any real time system with race conditions. By imposing techniques equivalent to semaphores, e.g., two-phase commit, it is possible to force linearization within the system.
  7. The system is an imperfect representation of the real world that may be different from observer to observer.

In the general distributed system two users may make the same inquiry and receive different answers. Suppose every event produces one or more time stamped transactions and that these transactions are propagated throughout the system. Someone inquiring against a set of resources should receive an accurate statement based upon the transactions received. However, if the propagation takes any time there is a lag, and if the delay varies transactions may be received out of sequence. In either case the inquirer will be presented with data that does not reflect information available within the system.

Rules for well designed distributed systems

1. The differences between observers follow a known probability distribution.

An element of the system design is a model of this information difference. For example, in a data warehouse application some data is moved into the data warehouse with daily snapshots , other data is mirrored transactions with a mean delay of three seconds. Each data item is identified with a target availability distribution that the system is engineered to deliver.

2. The system is a consistent estimator of the actual status.

3. The degree of slack, or tolerance of differences, is a parameter available to management.

The frequency of updates and other characteristics are parameters that can be adjusted to tune the system. Every technology has some inherent limitations, e.g., batch system cannot be tuned to perform real time monitoring. However, systems should be implemented so that unnecessary rigidity is avoided.

4. Any sustained exploitation of biases in the system is revealed.

Where there is an information gap there is the opportunity to exploit this. In general, to prevent fraud it is not necessary to have perfect information, just better information than the perpetrators. When messages depended on horses and ships, the Rothschilds made a fortune using carrier pigeons. An example of a problem that might arise is that a large order is placed for a commodity that will change the market price. If the order is not executed at once insiders can buy in advance.

Given that there are information lags it will be very difficult to detect isolated incidences. However, the system should be design to raise the risk of any continuing scheme. Basic precautions are to flush large transactions or significant accumulations of small transactions through the system. Other strategies would include analyzing the transaction record for any significant patterns or deviations from normal.

5. The system is able to recover and return to a consistent state that makes best use of available information.

This is an extension of the previous rule. Suppose a node becomes disconnected from the network and goes into a fallback mode where transactions are accepted in some form. When the system is restored it is important to bring the two pieces of the total system back into alignment. Any updates to master files must be copied down to local slaves and the stored transactions fed into the total system.

In general this could take some time. The simplest strategy is to update in FIFO, first-in, first-out, sequence. However, it would be better to sequence updates by the information they contain. Where items are of low value or abundant supply the exact status is less important than getting an accurate position for high value and short order items.

6. It is possible to construct a pseudo-snapshot of the system state for any particular time.

This is the basic requirement for developing a set of accounts and other basic business reports. Even though we are assuming continuous operation and delays in posting transactions, we are requiring a mechanism for logging and time stamping transactions so that ex-post a snapshot can be recreated.

Additional to time stamping it is also necessary that each system element can snapshot its status as of a particular moment for cross checking. These mechanisms are common in disciplines such as network management, but are not so well developed for application and systems management. Where they exist they are specific to an application and are not usually provided as a general mechanism available to all installed systems.

7. The system is auditable.

The rules above provide the basic mechanisms to ensure this is true, despite allowing considerable information slack. Until now it has been assumed, but not stated, that all transactions can be authenticated. With the amount of information flowing across the network the possibility of eavesdropping and tampering have to be assumed. Encryption and digital signature techniques must be used where appropriate. This distinguishes application management, as discussed here, from network management, which is usually neither encrypted nor authenticated.

Summary

Most discussion of distributed systems has focused on technical issues and mechanisms for synchronization. The problem of designing the business system and how it should behave has received much less consideration. This paper has set out some basic ideas for identifying the characteristics of well designed distributed systems.