CSE 221 Paper Evaluation

Greg Johnson (johnson@SDSC.EDU)
Mon, 29 May 2000 22:40:01 -0700 (PDT)

M. Feeley, W. Morgan, F. Pighin, et al., 1995.

This paper describes the design and implementation of a unified memory
management algorithm for clusters of workstations. The algorithm has
three distinguishing features. First, the algorithm is intended to be
used in an OS at a very low level, such that all higher level functions
benefit from access to cluster-wide memory. Second, decisions about the
management of memory are made in a global context (choices are selected
by individual machines, but based on system-wide criteria). Third, hosts
my dynamically join (and in all but one case) depart the cluster.

The Global Memory Service (GMS) algorithm works by swapping pages around
between memories of machines in the cluster as required to make room for
new pages, or as dictated by references. As time progresses, nodes tend
to accumulate pages required by the computation on that node. After local
memory is exhausted, the node will begin using memory on remote hosts.
Conversely, hosts with little or no computation will begin accumulating
pages in memory used by remote hosts.

The page replacement algorithm is probabilistic and favors sending swapped
pages younger than an agreed upon minimum age (if older they are discarded)
to nodes which are more idle than most. The replacement algorithm approx-
imates LRU. In cases in which all nodes are using memory for local compu-
tation, swapped pages are discarded rather than sent to another node. In
this way, thrashing is minimized. Like a well-choreographed square dance,
pages migrate from node to node based on usage and cluster-wide utilization
of memory.

One key point regarding GMS is that pages stored on a node on behalf of
another node (global pages), are always clean. Thus when a node holding
global pages goes down, the requester need only read the page from disk
to compensate.

The second half of the paper is devoted to an analysis of the performance
of this algorithm, implemented as part of OSF/1 over a cluster of Alpha-
based machines. Through the use of microbenchmarks and memory intensive
applications, the authors show that the cost of administering a cluster-
wide memory is inexpensive relative to the improvement in the performance
of cluster applications. In a nutshell, this approach achieves superior
performance for memory-intensive applications by leveraging remote memory
to reduce (more expensive) disk IO. The authors also demonstrate that the
algorithm makes efficient use of idle memory in the cluster.

Philosophically, the authors argue that as CPU performance increases and
communication latency decreases, clusters should be administered more as
multicomputers than collections of autonomous machines. However, since
the gap in the speed of the CPU versus the network is widening, remote
references are becoming more expensive relative to local references. On
a related point, if the speed of the secondary storage device improves
at a rate faster than that of the network, then the performance advantage
of the GMS approach diminishes.

Greg Johnson office: (858) 534-8367
Senior Programmer Analyst fax: (858) 534-5152
San Diego Supercomputer Center email: johnson@sdsc.edu
University of California San Diego