Re: papers for today

John-Paul Fryckman (fryckman@SDSC.EDU)
Tue, 30 May 2000 08:07:53 -0700 (PDT)

Implementing Global Memory Management

The authors wanted to combine the memory of multiple
computers and a fast network to give the user more
memory and improved performance on cluster based
applications. They implemented their management system
(GSM) in the lowest level of the OS because the OS has
many layers of latency it adds.

The paper goes into detail about the different page
fault resolutions. One of the key topics they hit
is a global notion of how recent a page was accessed.
Their method allows for other machines acting as the
backing store for any of the other machines. For example
if a local page needs to be swapped out, it finds
the global LRU page to victimize and forwards that page
to the machine. However it becomes increasingly complex
as the page becomes shared. All in all, the main premise
minimizes time to access memory via the layering of
local memory, global memory, and local disk with local
memory being the fastest. Also, the management structure
is distributed so page information has to be disemenated
to each node which adds a bookkeeping overhead to the system,
albeit small.

It was implemented with OSF/1 on a DEC Alpha cluster over an
ATM network. They achieved 1.5 to 3.5 percent speeds on
various applications and showed they it hardly effects
nodes on which the application is not running. Overall this
paper was well written and achieved its goals for GMS.

Memory Coherence in Shared Virtual Memory Systems

Li and Hudak addressed a very complex problem: memory
coherence. When anyone implements a distributed shared
memory system, one has to worry about memory coherence
whether its a full blown model or just provides the very
basic consistency that makes it possible for higher levels
to control their way.

The performance of memory coherence depends on two issues:
memory granularity and the coherence scheme. The former depends
on HW issues as well as network latency. A coherence scheme
has to deal with page synchronization and ownership. Furthermore,
one can deal with these two issues via a centralized scheme
or a distributed one. However a centralized scheme creates
bottlenecks, but provides for a simple implementation. A
distributed method allows for less contention, but adds complexity
to the overall system--one now has to locate pages.

It was a wonderful paper in the sense that it was complete and
discussed the topics in depth. They also showed some performance
figures that indicted some excellent results. And this becomes
a handy paper to reference to future DSM papers!