Paper Evaluation: 04-20-2000

Flavio P JUNQUEIRA (flavio@cs.ucsd.edu)
Thu, 20 Apr 2000 02:15:15 -0700 (PDT)

Title: "StarOS, a Multiprocessor Operating System for the Support of
Task Forces"

The StarOS is an object-oriented operating system developed for the
Cm* system. The Cm* system is composed by a collection of autonomous
processors divided in clusters. Because of the availability of several
processors, the operating system was developed to permit efficient
distributed computations. Each distributed computation is called a
task force, and is defined as a collection of small processes which
run concurrently.

In order to accomplish a distributed computation, mechanisms for
communication and synchronization are necessary. For communication
between processes, the system provides mailboxes. The mailboxes are
used to exchange messages between processes asynchronously. To
synchronize processes, the system provides a event mechanism, which
wakes up registered processes blocked on an event.

Each process can access some set of objects in the system. The access
to those objects is granted using capabilities, and the list of
capabilities is provided in the capability name space of each
process. One interesting aspect of StarOS is that the nucleus cannot
access directly every object in the system. First, it has to possess a
capability. This approach constrains the nucleus execution, avoiding
accidental interferences with user data.

Because the address space is shared among different computer modules,
it may happen that the code of a invoked function is not present in
the current processor. Thus, as the instructions are fetched, they are
transfered to the current processor. This approach turns out to be
inefficient, because a nonlocal memory reference takes more time.
Furthermore, despite of the fact that the authors point out the
possibility of dynamic reconfiguration, it was not implemented in the
system. It is important to provide fault tolerance and to accommodate
to environmental changes.

--------------------------------------------------------------------------

Title: "Medusa: An Experiment in Distributed Operating System Structure"

The Medusa operating system was also developed to the Cm* system.
Medusa, however, is more concerned with problems related to
modularity, robustness and performance. The unit of management is the
task force, which is a collection of activities. The functionalities
of Medusa are called utilities, and they are also divided in activities.

The activities can communicate among them using pipes to exchange
messages. One interesting aspect of the pipe implementation is the
use of a pause time. If an activity is blocked because of a "pipe
full" or a "pipe empty" condition, instead of swapping the context
immediately, the activity's processor waits for an interval of time
equal to pause time. This approach often avoids unnecessary and
expensive context switches.

A very important improvement to the approach presented in the StarOS
is the execution of a invoked function in the processor where its code
resides. Hence, instead of transferring the code from the computer
module which currently stores the function code, the execution is
transfered to that computer module. In this way, the performance
of the system increases.

In terms of protection, the system provides a descriptor scheme to
grant the access to objects. Each task force has a list shared among
the activities, and also each activity has its own private list. To
permit utility activities to access protected types, those activities
execute with a status bit set. As mentioned in the paper, this
protection mechanism provided is not as fine-grained as the one based
in capabilities used in the StarOS.

As one of main concerns of the system is robustness, several aspects
of the system try to guarantee it. For every utility task force, there
are always at least two activities running in different
processors. Note that new activities can be instantiated if the load
increases. Furthermore, the independent services are distributed among
utilities in order to avoid deadlocks in the operating
system. Finally, an exception handling mechanism is provided, so that
errors in the execution of a function can be handled by activities
that are co-owners of the object or by a buddy activity.

In my opinion, the paper, which describes the Medusa system, explains
better the mechanisms in the system and also points out clearly the
design decisions.

--Flavio Junqueira

------------------------------------------------------------
Flavio Paiva Junqueira, PhD Student
Department of Computer Science and Engineering
University of California, San Diego

Office location: AP&M 4349 e-mail: flavio@cs.ucsd.edu
------------------------------------------------------------