5/18 paper evals

Yod (h13nguye@ieng9.ucsd.edu)
Thu, 18 May 2000 02:17:59 -0700 (PDT)

Henry H. Nguyen
h13nguye@ucsd.edu
(858) 587 - 7046
Title: Lightweight Remote Procedure Call

This paper argues that Lightweight Remote Procedure Call (LRPC) achieves much
better performance than the typical Remote Procedure Call, by optimizing the
facility for communication between protection domains on the same machine.

The paper starts out by addressing the issue that typical RPC fails to optimize
the facility for cross domain communication within a single machine. Cross-
domain communication can be considerably less complex than its cross-machine
communication, yet conventional RPC systems have not fully exploited this fact.
Local communication is treated as an instance of remote communication.

The main motivations for optimizing the local communication facility is that
the frequency of local communication. Because local communication is alot more
frequent than remote communication, according to the study of several systems
which includes the V system, the Taos system, and the Unix+NFS, an optimization
in local communication would greatly improve the performance of the system.

Traditional RPC treats local communication the same way it treats remote commu-
nication by adding the overheads that belong to remote communication. These
overheads include: stub overhead; message buffer overhead; access validation;
message transfer; scheduling; context switch; and dispatch.

The paper then goes on to describe the design and implementation of LRPC. Some
of the topics that were discussed in the paper include: lower-level binding in
LRPC; stub generation; LRPC on a multiprocessor system; and argument copying.
The four techniques contribute to the performance of LRPC are: simple control
transfer; simple data transfer; simple stubs; and concurrency.

Title: Active messages: a Mechanism for Integrated Communication & Computation

This paper introduces the concept of active messaging to increase the perfor-
mance of communication, by simply overlapping communication and computation.
The point that this paper tries to make is that traditional RPC systems do not
achieve high processor efficiency, because the ratio of the time for computation
versus communication is lop-sided, and one is idled considerably.

Active messages minimize communication overhead, allow communication to
overlap computation, and coordinate the two without sacrificing processor cost/
performance. Active message is an asynchronous communication mechanism that
intended to expose the full hardware flexibility and performance of modern
interconnected networks. Active messages are not not buffered except as
required for network transport. Active messages differ from general RPC
mechanisms in that the rold of the active message handler is not to perform
computation on the data, but to extract the data from the network and integrate
it into the ongoing computation with a small amount of work.

The paper goes on to describe briefly about active message implementation on
two different systems, nCube/2 and Cm-5. It also talks about the Split-C
programming model using active messages to provides split-phase remote memory
operations. Another active message system that was mentioned in the paper is
the Threaded Abstract Machine (TAM). TAM, a fine-grain paralledl execution
model based on Active Messages, goes one step further and requires the compiler
to help manage memory allocation and scheduling. Using the TAM scheduling
hierarchy, the compiler can improve the locality of computation by synchronizing
in message handlers and enabling computation only when a group of messages as
arrived.

Hardware support for active messages is also discussed in the paper. Hardware
support for active messages falls into two categories: improvements to network
interfaces and modifications to the processor to facilitate execution of
message handlers. Network interface design issues that were discussed include:
Large messages transfer with DMA; Message registers through direct communication
between the processor and the network interface to save inctructions and bus
transfers; Reuse of messge data by keeping it in the registers or keeping
additional context informantion such as the current frame pointer and code base;
Single network port and protection. For the processor support for message
handlers, the paper discusses Fast polling, user-level interrupts, PC injection,
and dual processors.