Distributed Operating Systems

Karan Bhatia


Selected Papers


Other Sites


Summary

During the 80's, there was a lot of interest in distributed operating systems, systems that tied together clusters of workstations with communication networks. Two well known and long term projects were the Sprite system developed at UC-Berkeley by John Ousterhout and the Amoeba system, from Vrije Universiteit, Netherlands led by Andrew Tanenbaum.

Both projects are essentially complete with little more research being done on them. But during their peak, both systems were close to production systems supporting a large number of users as well as developers.

These days there seems to be a renewed interest in building resource sharing systems (Globus, PVM, CONDOR, GLUnix, etc). The goals of these systems to allow transparent sharing of resources is the same as that of the distributed operating systems of the 80's, except the systems being built now are build on top of operating systems instead of inside them. What lessons did we learn from the 80's? And why is it that those systems are inadequate for todays needs? This does not answer these questions, but it does try to give a detailed overview of how Sprite and Amoeba work. It is hoped that after seeing what they did we can get an understanding of why they are not suitable for todays resource sharing needs.

In the Sprite model, the operating system runs on a cluster of workstations in an attempt to make it look like a single timeshared machine. A user logs into one of the workstations in the cluster which becomes his "home" machine and runs jobs as he would on any unix workstation. Sprite is 4.3 BSD compliant with some extra features: sprite adds the ability to fork jobs on idle machines, does automatic process migration (including forwarding when necessary), and includes a single, completely transparent file system.

Amoeba is more radical in design: it is not (100 %) unix compatible, and in addition to file transparency, has process transparency. The model for Amoeba is that there is a pool of CPU's connected to dedicated servers (like file servers) and X-terminals (for the user). There is no "home" machine. Every process in the system is assigned to a processor and can be migrated if needed. The whole system is object based using capabilities to communicate with the object servers.

The goals of these systems were somewhat different than todays resource management efforts (but not too different). They focused on distributing applications, that is, using remote resource for essentially serial processes. In Sprite, although the kernals communicate with RPC, the RPC mechanism is hidden from the applications which have to communicate through the file system. They really focused on off-loading serial non-communicating jobs and a unified file system (this was done before NFS made it widespread). Amoeba was set up more for communicating processes and it's object model bares strong resemblance to the Common Object Request Brokarage Architecture (CORBA) standard. This model supports client-server processes, but is probably not general enough for arbitrary parallel computation.


Karan Bhatia