Evaluations.

Sobeeh@aol.com
Tue, 30 May 2000 02:11:42 EDT

---------------------------------------------------------------
Implementing Global Memory Management in a Workstation Cluster
---------------------------------------------------------------

The paper describes the global memory service system developed to manage distributed memory in a workstation cluster. The system provides dynamic sharing for the cluster-wide memory resources in order to provide a balance on the nodes memories between local and global usage. The system uses a probabilistic LRU replacement for the pages, in favor of a global page before a local one. Managing the memory globally gives a better performance for the whole cluster than individual decisions taken by the nodes. The goal of this replacement strategy is to minimize the memory references, and try to make the state of the referenced page the cheapest.
During normal operation, an initiator node handles the LRU information. It processes the information and sends the results to all the nodes in the cluster. The next initiator in the next epoch is determined at that time too ( one with minimum load ). The periodically distributed age information is the key to the success of the algorithm. Node failures do not result in loss of data since only clean pages are cached in memory and can still be retrieved from memory if the node storing the information fails.
Three main data structures implement the algorithm: page-frame-directory ( which holds the information of the pages on each node ), the global-cache-directory ( which locates the node that has the page ) and the page-ownership-directory ( which locates the node that has the global-cache-directory). Using those data structures, finding a page, performing LRU and addition and deletion of nodes becomes a matter of lookup in those structures.
The paper describes the reasonable possible scenarios in the system. They provide heavy performance tests and it looks like they got the job done right. Unfortunately, I was lost with the many numbers they gave in the performance section; however, I think the limitations section talked about them and they successfully described the problems. I would rate the paper 9/10 ( either it is a very good paper or I have no idea about what they are talking about!, second is more reasonable since the performance section is half of the paper).

-----------------------------------------------------
Memory coherence in a shared virtual memory system
-----------------------------------------------------

The paper describes the coherency problem of a shared virtual memory system that rises in a loosely coupled multiprocessor. The authors tried hard to differentiate their solutions from the ones given for the multicache coherence problem ( still trying to figure the difference in the solution ! ). The previous work handled and optimized the distributed nature of the system but still unsuitable for a parallel computing environment. The proposed design of the virtual memory should give a natural and a sufficient process migration and RPCs without the need to implement a dedicated facility (Cool ! )
The coherency problem rises when multiple processors share the same data. Each processor should use the most recent update to the page they are using. The authors divided the different available choices for insuring the consistency of the data. The solutions they gave are based on the location of the manager of a set of pages. Based on this, the virtual address space is either handled by one manager ( centralized, manager is a bottleneck ) or distributed between several managers ( static or dynamic owners and some improvements to the dynamic one ).
I do not really see the difference of the problem to that of a multicache ( based on my limited knowledge in this area ! ), and if the two problems were different, then I guess the solution to this one should be sufficient to solve the other one. Also, based on what I understood from the scenarios in section 4.2 and section 5.2 , the request from the second process will be queued till the first process received its ownership. When that happens, the second process request is processed by the new owner. What I do not get is what going to happen afterward! I mean if both of those processes ( P1 & P2 ) wanted to write into the page, then the page would circulate between them all the time without writing ( kind of thrashing ). If a solution is achieved via giving the new owner a minimum quantum of usage, then you have set a performance boundary.

-------------------
Sobeeh Almukhaizim
-------------------