CSE221 LRPC and Active Messages

Andreas Anagnostatos (aanagnos@cs.ucsd.edu)
Thu, 18 May 2000 07:06:09 -0700

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M.
Levy, "Lightweight Remote Procedure Call," Proc. Twelfth Symposium on
Operating Systems Principles, pp. 102-113, December 1989.

Lightweight Remote Procedure Call is an RPC that is optimized for the
common case. That is, calls across domains within a single machine
instead of across a network. Usage measurements have shown that cross
domain RPC (as opposed to cross machine) account for 95%-99.4% of all
calls. Furthermore most calls pass simple arguments (i.e. chars,
integers) instead of complex structures, so LRPC is tuned to perform
best on simple calls. LRPC is implemented on the Taos operating system
on the DEC Firefly processor, and performance measurements show that
LRPC takes 1/3 of the time of RPC which is close to the lower bound set
by the underlying hardware.

I found the paper very well written and easy to understand. The author
mentions in the multiprocessor context, that if an idle processor is
needed but not found, the kernel increments a counter. Something I would
like to have seen in the paper is the percentage of calls that would get
executed in a second or third processor simultaneously given
measurements on a "typical" workload. The paper shows that the
performance of LRPC in a multiprocessor environment increases linearly
with the number of processors because the authors chose not to lock
shared data in memory. Why did the implementers of RPC decide to lock
shared data, and is there a danger in not using locks for shared data?

Thorsten von Eicken, David E. Culler, Seth C. Goldstein, and Klaus E.
Schauser, "Active Messages: a Mechanism for Integrated Communication and
Computation," Proceedings of the 19th International Symposium on
Computer Architecture, May 1992, pp. 256-266.

The paper describes active messages, a communication mechanism designed
to overcome communication overhead and achieve high performance in
large-scale multiprocessors. The goal is to be able to overlap
communication with computation while keeping communication cost at a
minimum. The basic low level mechanism offered by active messages
closely matches the primitives offered by hardware unlike existing
messaging schemes. An active message contains an address of a
user-level instruction at its header that the receiver uses to begin
execution of the code. Measurement on the nCube/2 show that active
messages perform slightly over the minimum suggested by hardware which
is an order of magnitude lower than existing messaging systems. Finally,
the authors make recommendations for possible hardware augmentations to
support active messages without sacrificing complexity and/or cost.

Looking at Fig.5 which shows the performance of Split-C matrix multiply,
what is the elapsed computation time for the multiplication. How does
the communication overhead contribute to the total time taken for the
task. The increasing percentage of processor utilization is towards the
task or toward handling the communication. Since the whole paper
emphasizes of performance, I believe it should include more performance
measurements in a variety of tasks than just network and processor
utilization for a simple matrix multiplication. Compared to RPC, active
messages do not compete because RPC requests computation on a set of
parameters and gets the result, whereas active messages distribute the
computation of a single program to multiple nodes by having them execute
a sequence of instructions taken from a single address space. Overall I
am convinced that active messages is a good mechanism that could greatly
improve performance in a multiprocessor environment, especially when
coupled with hardware support.