Yu Xu (yxu@cs.ucsd.edu)
Tue, 30 May 2000 00:46:14 -0700

Evaluation of " Memory Coherence in Shared Virtual Memory Systems"

This paper gives two classes of algorithms, centralized and distributed, for
solving the memory coherence problem in designing and implementing a shared
virtual memory on loosely coupled multiprocessors.
Before this paper, there are several approaches. But they all have some
disadvantages. Spector's remote reference/remote operation model is useful for
data transfer in distributed computing, but it is unsuitable for parallel
computing; message passing has difficulties in passing complicated data
structures; finally, the approach of providing a set of primitives to
programmers has difficulties when attempting process migration or passing large
data structures.
A shared virtual memory is a single address space shared by a number of
processors. In fact, the memory mapping manger views its local memory as a large
cache of the shared virtual memory address space for its associated processor.
Page synchronization has two basic approaches : invalidation and
write-broadcast. But write-broadcast need special hardware support. The
ownership of a page can be fixed or dynamic. Fixed page ownership is expensive.
The centralized manager algorithms is quite straitforward. The manager
resides on a single processor and maintains a table called Info which has one
entry for each page, each entry having three fields:owner, copy set, lock. Each
processor has a page table called PTable that has tow fields: access, lock.
These two tables have page-based locks. In the improved centralized manger
algorithm, the synchronization of page ownership is moved to the individual
owners, thus eliminating the confirmation operation to the manager. An entry in
the PTable in each processor now has a more field: copy set. It saves one send
and one receive per page fault on all processors.
A broadcast distributed manager algorithm moves owner field to PTable. It's
easy to implement, but not scale well. In the dynamic algorithm, the owner field
is replaced with another field, probOwner. the copy set data associated with a
page is stored as a bi-directional tree rooted at the owner.
The experiments indicate that many parallel programs exhibit good speedups on
loosely coupled multiprocessors using a shared virtual memory.

This paper mainly concern on finding a page in a multiprocessors environment.
It doesn't pay attention to the page replacement issues, which is very important
. The next paper has very detailed discussion of page replacement strategies.

Evaluation of " Implementing global memory management in a workstation cluster"

The goal of this paper is to use a single, unified, distributed memory
management algorithm at the lowest level of the operation system to exploit the
characteristics of current clusters. All system- and higher-level software, including VM, file systems, transaction systems, and user applications, can benefit from available cluster memory. They assume a single, trusted, cluster-wide administrative domain.
The key of the algorithm is its use of periodically-distributed cluster -wide age information to:(1) house global pages in those nodes most likely to have idle memory, (2)avoid burdening nodes that are actively using their memory, (3)ultimately maintain in cluster-wide primary memory the pages most likely to be globally reused, and (4) maintain those pages in the right places.
The basic data structures keyed by UID include: page-frame-directory(PFD), global-cache-directory(GCD), page-ownership-directory(POD). The addition and deletion of nodes can easily handled by manipulating POD and GCD. The basic operations are Getpage,Putpage. To manage global age information, epoch is introduced with MinAge, maximum number of cluster replacements, maximum duration T.
The limitation in this papers include:
It doesnít permit dirty pages to be sent to global memory without first writing them to disk; it assumes trust; LRU is the replacement scheme, yet itís not always the best choice; the performance of CPU with idle memory may degrade as lot of other CPU try to use it.

Finally, I think the approach in this paper is very good, it uses idle memory in other processors to reduce the access time. The method to locate and house pages is dynamic and efficient.