paper evaluations 04/20

Alejandro Hevia (ahevia@cs.ucsd.edu)
Thu, 20 Apr 2000 01:18:33 -0700 (PDT)


Evaluation of paper
"StarOS, a Multiprocessor Operating System for the Support of Task Forces"
by A.K. Jones et al.

The paper describes an operating system for a multiprocessor system
- Cm* - based on collections of small cooperating processes called
"Task Forces" implementing OS services. These processes are distributed
over different processors which also schedule user processes.
Processes communicate and share data (even capabilities) using asynchroneous
messages (provisions for a "mailbox" abstraction are available) although
also provides memory partitionated into clusters that can be shared too.
The existence of abstractions for (possibly parallel) procedure calls
(called modules) based on strong-typed objects with capability-based
authentication is a key point in the proposed system design.

Since the system is intended to provide some degree of reliability,
replication of OS services (by providing nucleus processes in each computer
module). Moreover, the system can be dynamically reconfigured to adapt to
changes or faults (although very especific ones) in the underlyning
machine.

The proposed operating system strongly relies on some of the physical
characteristics of the arquitecture, like cross-cluster processors "Kmaps"
(which handle the interaction between processors at different clusters),
"Slocals" (which control the access to local devices and memory) or the
specific microcode implemented on them. This naturally works against a
clear and more understandable design.

The system provides a very flexible environment to extract the most of
this kind of multiprocessor architecture, since provide a high degree
of control to the user (for example, by providing atomic operations on
"representation objects"). Nevertheless, as usual in this kind of system,
one of its drawbacks is the high cost of accessing an non-local objects
(proportional to the object's size). Also, the fault model considered
in the paper is rather simplistic; it remains to be seen what level of
fault tolerance is actually achieveable on the system.

------------------------------------------------------------

Evaluation of paper
"Medusa: An Experiment in Distributed Operating System Structure"
by J.K. Ousterhout et al.

The paper describes an operating system for the Cm* multiprocessor environment.
The system's goals are modularity (use of collection of "independent" but
cooperating processes), robustness (support process adjustment based on
workload and existence of faults) and efficiency (avoid system overhead
over user application). The system is based on task forces. Furthermore,
every OS service (or "utility") is implemented as a task force and, hence,
the OS is fully distributed. Communication between processes (and task forces)
is based on packed-oriented "pipes", and protection is based on descriptor lists,
a light variation of capability-based authorization. The services implemented
are handled by a small (local to the processor) kernel process (for local
interrupts), a memory manager, a file system manager, an exception manager and
a debugger/tracer.

In this work, the authors discuss schemes to solve two key issues in multi-processor
systems: how to partition resources (including work) in an efficient and
modular way, and how to provide robust and helpful communication for the processes.
The suggested ideas are based on favoring local access/processing, dynamic
creation, distribution and real parallel execution (coscheduling) of service
implementation (including service migration) and efficient implementation
of a message-based communication system for service request. It is worth to
mention that the proposed semantic of function invocation is equivalent to that of
value-result parameter passing.

Of special interests are the solutions to prevent deadlock among the utilities
(although no general provision is given to user processes) and the exception
report scheme. The latter allows a more structured (yet robust) handling of
faults and exceptions.

Some of the drawbacks of the system design are the excessive restrictions on
the design and layout of user application (since services are not efficiently
available "everywhere"), the lack of a robust protection scheme among
activities and the fact that the system - up to that moment - was not yet fully
implemented, which raises some doubts about the actual efficiency of the system
when real applications are tested.