- INTRODUCTION
- APPROACH:
- toolkit
- provides a set of core services for grid-enabled tools and apps
- bag of services model
- apps can mix n match
- allows for incremental development
- layered approach
- exploit standards and commodity technology where appropriate in
core infrastructure
- address interdomain issues; integration of intradomain solns
- design principles
- keep participation cost low
- enable local control
- support for adaptation
- CORE SERVICES:
- RESOURCE MANAGEMENT
- GRAM (Globus Resource Allocation Manager):
- provides
remote job submission and management; uses GSI for security
- DUROC (Dynamically Updated Request Online Co-allocator):
- layers
on top of GRAM to provide multiple, simultaneous job
submission and management (i.e. "co-allocation")
- RSL (Resource Specification Language):
- language used to
communicate resource requests.
- INFORMATION INFRASTRUCTURE
- MDS (Metacomputing Directory Service):
- a LDAP based
information repository for storing information about
resources
- SECURITY
- GSI (Grid Security Infrastructure):
- provides a public key
based security system that layers on top of local site
security; provides users with a single sign-on access to the
various sites to which they are authorized.
- COMMUNICATION:
- Nexus:
- a high-level communication library; provides active
message style communication (asynchronous RPC), multi-method
communication, data conversion, and multi-threading
- IO:
- a low-level communication library which provides a thin
wrapper around TCP, UDP, IP multicast, and file I/O; it (optionally)
integrates GSI into the TCP communication
- REMOTE ACCESS:
- GASS (Global Access to Secondary Storage):
- provides secure
remote access to files
- GEM (Globus Executable Management):
- intended to support
identification, location, and creation of executables in a
heterogeneous environment.
- FAULT DETECTION:
- HBM (HeartBeat Monitor):
- provides mechanisms for monitoring
multiple remote processes in an job and enabling application
to respond to failures
- Nexus Fault Detection:
- notifies applications using Nexus
when a communicating process fails (but not which one)
- QoS
- GARA (Globus Architecture for Reservation and Allocation):
- provides dedicated access to collections of resources via
reservations
- Gloperf:
- provides bandwidth and latency information
- AN ADMINISTRATOR'S VIEW:
An administrator (typically with root permissions) for a single site (same
file system) running Globus on usually multiple resources.
- installs scripts in global location
- installs a set of daemons on each resource
- may need to configure service to interact with local resource if
not already provided by Globus (e.g. a job manager or security
policy).
- AN PROGRAMMER'S VIEW:
Creates/maintains a Globus-enabled application or tool.
- interacts with Globus services via defined APIs
- can incrementally add Globus services as they see fit
- typically an application uses at least GRAM and GSI
- portability
- AN USER'S VIEW:
User of a Globus-enabled application.
- level of interaction with Globus will vary depending on
application
- usually responsible for contacting administrators to set up
Globus access to desired resources
- usually responsible for obtaining a unique global id and secure
logon (i.e., certificate and private key)
- AN EXAMPLE:
- RESOURCE MANAGEMENT
- GOALS:
- site autonomy
- - local sites will have different policies
- heterogeneity
- - local sites will have different resource
management systems (e.g. LSF, Loadleveler, Condor, etc)
- policy extensibility
- - able to support new site specific management
structures without requiring changes at participating sites
- coallocation
- - provide support for creating and monitoring
processes on multiple resources
- online control
- - provide support for modifying allocated resources
during exectuion
- ARCHITECTURE:
- Resource Specification Language (RSL):
- used to communicate
requests for resources among entities
- Resource Brokers:
- specializes abstract RSL expression to concrete
RSL expression
- Coallocator:
- allocation and management of resources
- Resource Manager:
- processes RSL requests and either envokes request or denies it
- Information Service:
- provides information about current
availability
- GRAM:
- RSL:
- passed as a string
- list of parameters or requests
- parameter is 'param-name op value' (e.g. executable = /usr/bin/ls )
- parameter-name is extensible (someone just needs to recognize
them)
- parameter can be
- MDS attribute names: abstract RSL
- express constraints on resources (e.g.
OS, globus version,...)
- scheduler parameters: ground RSL
- communicate info to start a job (e.g.
path to executable, stdin, stdout,...)
- supports multirequest(+), conjuntion(&), disjunction(|)
- e.g. &(executable = /usr/bin/ls)(stderr = mystderr)(stdout = mystdout)
- GRAM CLIENT LIBRARY:
API to interacts with GRAM gatekeeper
- mutual authentication
- transfer request (RSL string)
- register a callback
- GATEKEEPER:
- typically run as root
- assumes RSL is ground (i.e. can fulfill request with
further interaction with requester)
- performs mutual authentication of user and resource - GSI
- determines local user name
- starts job manager (executes as user)
- JOB MANAGER:
- handles RSL request (i.e., submits job to local resource manager)
- remote monitoring and management of jobs
- monitors state of process
- notifies callback contact of any state changes
- jobs goes through pending (submitted), active(started),
and either done or failed
- implementing control ops (e.g. process termination)
- DUROC:
- splits request into components and submits each to the
appropriate manager
- monitor and control collection
- 3 ways to start a job
- barrier synchronization and strict job state monitoring (if
one process fails, the job fails)
- barrier synchronization (synchronize at the beginning)
- no barrier (processes run independently of other jobs)
- edit a pending request (add new nodes; edit out failed
nodes) and commit to configuration
- communication between subjobs with processes of rank 0
- GRAM REPORTER:
Periodically populates MDS with info about its current state
and availability and capability of resources
- VIEWS:
- ADMINISTRATOR:
- maintain gatekeeper for resource - typically run out of inetd
- setup job manager(s) - (e.g. SP2 has fork and loadleveler job
managers)
- maintain GRAM reporter (discuss in MDS section)
- PROGRAMMER:
Determines ground RSL expression and either
- uses DUROC API to submit as a whole job or
- uses GRAM API to submit possible multiple individual jobs
- USER:
- Obtain global id
- Obtain a local account with each resource it wishes to use and
be added to each resource's grid-mapfile (maps user's
global id to local id)
- Submit list of available resources to program.
- INFORMATION INFRASTRUCTURE:
- GOALS:
- rapid access of information - use caching
- scalable infrastructure
- low cost (creating and maintaining information)
- uniform data model and API (easier to use)
- expressive data model
- extensible
- handle information from multiple sources
- handle dynamic data
- flexible access - read, update, and search data
- secure
- easily deployable
- decentralized maintenance
- ARCHITECTURE:
- Currently supports a push model
- GRAM reporters periodically send data about its resources to an
information server
- Information servers can be distributed and use referrals to
access data stored outside of its server
- Support for a pull model in version 1.1.3
- Reduces load on information servers
- Information is pulled from the GRAM reporter only when it is
needed
- Information Server:
- holds published data
- GRAM Reporter:
- Runs on an individual site and publishes
information about its resources to its information server
- MDS:
- adopts data representations and API defined by LDAP (Lightweight
Directory Access Protocol)
- an information model defining form and character of information
- organizes info into well-defined collections (i.e., entries)
- entries are organized in DIT (Directory Information Tree)
which is a hierarchical, tree-structured name space
- an entry is represented by a set of attributes
(name, value pairs)
- an example of a class DIT for UCSD

- a network interface for accessing information in a directory
- access of data done using
- base dn - begin search at
- filter - specify attributes
- scope - how may levels to search relative to base dn
- a distributed operation model defining how data may be
distributed and referenced
- extensible protocol and information model
- provides a data model
- MDS consists of many entries where an entry is some type of
object (e.g. organization, person, network, or computer).
- networks and computers represented as children of an
organization
- GlobusHost - a computer resource
- GlobusNetwork - physical network link
- networks and host interface with GlobusNetworkInterface
(contains interface speed, hw address, etc.)
- Images used to represent multiple views of the same
physical network (e.g. IP and IPX)
- e.g. DIT for UCSD

- object class associated with each entry describes the set of
attributes (required and optional). The definition contains
- parent
- must contain list
- may contain list
- can be a subclass (inheritance)
- TTL (good for caching - not implemented yet)
- each mds entry has a dn (distinguished name) which is
constructed by following the path to a specific DIT entry
- entries stem from a Globus root
- Data accessed via MDS scripts (layer on top of ldap commands) in
the following fashion
- white pages - look up attributes given a particular dn
- yellow pages - look up resources given a particular class or
property
- Provides unix scripts to publish data in MDS server
- VIEWS:
- ADMINISTRATOR:
- maintain GRAM reporter - daemon
- manage entries in DIT
- PROGRAMMER:
Query MDS to determine best possible configuration of resources. Also can
use MDS to determine gatekeeper contact.
- USER:
May need to query MDS for resource information depending on program (e.g.
gatekeeper contact).
- SECURITY:
- GOALS:
An interdomain security solution.
- assumes grid consists of multiple trust domains
- assumes resource pool and user population are large and dynamic
- interoperate with local security solutions - local security
policies differ
- authentication
- exportable - cannot directly or indirectly require use of bulk
privacy
- uniform credentials/certification - a user will be associated
differently with site it has access to
- single logon - number of processes used in a computation will be
dynamic
- access control
- support for secure group communication
- support for multiple implementations (i.e., extensible)
- SOME DEFINITIONS:
- trust domain
- - single security policy
- subject
- - user, process acting on behalf of user, a resource, or
process acting on behalf of resource.
- credential
- - piece of info that proves identity of subject
- authentication
- - process of subject proving identity to requestor
- mutual authentication
- - two-way authentication
- object
- - resource protected by security policy
- authorization
- - process where subject is allowed access to an
object
- ARCHITECTURE:
- global and local subjects exist; for a trust domain a mapping
mechanism exists
- global subject once mapped is equivalent to local authenticated user
- operations between entities on different trust domains require
mutual authentication
- all access control decisions made by local subject
- a program or process is allowed to act on behalf of user
- does not require bulk privacy (rely only on authentication and
signature techniques)
- GSI:
- separate protocol and mechanism
- built on top of GSS-API (Generic Security Services)
- authentication
- signing of messages
- encrypting of messages
- produces sequence of tokens which are transport independent
(TCP sockets, Nexus, etc.)
- mechanism independence (does not specify specific security
protocol)
- currently uses SSL (just the authentication part) as mechanism
- SSL uses public key encryption
- each user has a cert
- permissions/validity
- RSA public key
- signature of CA
- and a private key
- SSLeay is high quality, public domain (developed outside of US
so avoids export issues) implementation of SSL
- token stream can be extracted easily
- used by broad range of services (easy to access from GSI)
- mutual authentication: user issues request to resource
proxy. if request successful, the resource is allocated.
- user proxy and gatekeeper authenticate each other
- user sends certificate to gatekeeper
- gatekeeper sends copy of certificate to user
- user checks gatekeeper's certificate signature against trusted
certs; gatekeeper checks user signature agains CA's trusted
certs
- user checks subject of gatekeeper's cert against requested;
gatekeeper checks user subject against grid-mapfile
- user proxy - acts on user's behalf without requiring user
intervention
- a session manager process given permission to act on behalf of
user for limited amount of time.
- user creates a temporary user proxy credential (signs a tuple
containing user info and validity interval).
- user proxy process created and provided with temp. credential
- bypasses having to log onto every single resource
- resource proxy - agent used to translate between interdomain
security ops to intradomain security mechanisms (i.e., gatekeeper)
- by default, proxies can't be delegated; can change that but
weakens security
- VIEWS:
- ADMINISTRATOR:
- obtains certificate for each resource
- maintains grid-mapfile.
- maintains certificates of trusted CAs
- PROGRAMMER:
By default, a proxy can't be delegated (that is a user's remote process
isn't authorized to spawn other processes). However, there are ways to
allow this which require a little footwork. For example, the proxy's
certificate can be copied to the remote site.
- USER:
- generate certificate and get it signed by trusted CA (usually
Globus CA) - one time
- indicate which CA certificates user trusts (usually point to
directory containing Globus CA certificate)
- generate proxy certificate - before running
- COMMUNICATION:
- GOALS:
Support multiple communication methods transparently within a single
application.
- COMMUNICATION:
- NEXUS:
A portable, multithreaded communication library designed for use by parallel
language compilers and high-level communication libraries.
- a computations executes on a set of nodes
- a computation consists of a set of threads each executing in an
address space called a context (a process)
- a thread executes a sequential program which may read and
write data shared with other threads executing in the same context
- intercontext references are called communication links
- communication links connect startpoints and
endpoints (data structures)
- startpoint stores information about where remote object is
located and how to communicate with it
- startpoints can be communicated between nodes ( grant rights
on a endpoint )
- endpoint maintains local state for each startpoint (can be
used to maintain security information)
- formed by binding startpoint to an endpoint
- can bind many startpoints to an endpoint and many endpoints to a
startpoint
- remote service request (RSR) initiates communication and
invokes remote computation (function specified by an endpoint handler);
data buffer constructed using PVM stype put routines;
is the only single communication operation supported but allows for
- point-to-point
- remote memory access
- streaming protocols
- multicast
- COMMUNICATION:
- VIEWS:
- PROGRAMMER:
Use Nexus API to implement interprocess communication. Probably need to
synchronization API (mutex and condition) to protect data.
- REMOTE ACCESS:
- GOALS:
- uniform access to files
- access diverse data sources - files on remote tape, disk
- dynamic resource set - low institutional overhead
- support for streaming I/O
- little or no program modification
- support for programmer-directed performance optimization
- ARCHITECTURE:
- not building a distributed file system
- provides optimized support for file access patterns common to Grid
applications
- GASS:
- Provides the following access mechanisms:
- read only access to 'constant data'
- non-coherant write access
- append-only access to a file with output required in real-time
(e.g. logs)
- unrestricted read/write access to an entire file with no other
concurrent accesses
- API like unix I/O counterparts
- url used in gass open call
- Provides a file cache on local machine
- 3 operations:
- fetch and cache on first read open:
- local cache checked; if
not there, entire remote file fetched; reference count
incremented. Note: may not be good for large files
- flush cache and transfer on last write close:
- when file closed,
reference count checked; if reference count is one, the file is
copied to remote location and deleted from cache. otherwise reference
count is decremented.
- file opened in append mode:
- is not placed in cache; a
communication stream is created to remote location.
- pre-staging and post-staging also implemented for large files
- can also specify where data can be stored
- cache management API - maybe used for coherancy; user to control
insertion, locking, removal, and reference counting
- client and server implementation APIs to implement new data
management services
- VIEWS:
- PROGRAMMER:
Needs to either start up remote gass server as part of the program or
rely on user to do it. In configuration file, need to be able to
recognize urls and use gass file access api functions. Can be used for
data files, log files, executables, etc.
- USER:
Possibly start up remote gass servers and input urls to program.
- FAULT DETECTION:
- GOALS:
- scalable - support large number of processes
- accuracy and completeness - false positives and negatives being rare
- timeliness
- low overhead
- flexible - support multiple policies
- ARCHITECTURE:
Enable the following in response to a failure
- terminate entire job
- ignore failure
- restart failed process on a new resource
- use replication to continue execution
- HBM:
- local monitor:
- observes state of the computer it's on and any
monitored processes on that computer. Uses 'ps'
- data collector:
- receives heartbeat messages generated by local
monitors and identifies failed components based on missing
heartbeats. Programmer specifies what happens when components fail.
- registers heartbeat for itself in case of failure
- API is callback based (provide callback function and event mask)
- client registration API:
- application uses to specify processes to
be monitored by local monitor and the data collector to whom
heartbeats are to be sent. Must unregister before process
termination.
- heartbeats sent via UDP
- VIEWS:
- ADMINISTRATOR:
maintains local monitor on each resource
- PROGRAMMER:
In program, registers processes with local monitors. Implements data
collector (determines what happens when failure is detected).
- QOS - RESERVATIONS:
- GOALS:
- dedicated access to collections of computers on heterogenous
distributed systems
- support for widely varying types of resources (e.g. computers,
networks, memory, etc) located on possibly different
administrative domains
- provide reservation mechanisms for resource managers that don't
support reservations
- allow applications to discover, reserve, and allocate on
potentially complex collections of resources
- ARCHITECTURE:
- co-reservation agent:
- discovers and reserves resources; informs
user when resources are ready
- local resource allocation manager:
- provides basic reservation
services
- GARA:
- supports immediate and advanced reservations
- LRAM
- - Local Resource Allocation Manager
- interacts with system specific resource management components
and services. 3 cases
- local scheduler provides reservations:
- LRAM passes advanced
reservation requests directly to scheduler
- local scheduler does not provide reservations:
- LRAM has
total control over resource and can use a slot manager
to implement advanced reservations using a timeslot table
(y-axis is percent of resource and x is time)
- locals scheduler does not provide reservations and LRAM
does not have total control over resource:
- LRAM can
only support probablistic advanced reservations
- implemented LRAMs for DSRT, differentiated services, DPSS
- GARA External Interface:
- provides APIs for
- authentication and dispatch of incoming requests
- registration and propogation of upcalls to remote processes
- publication of resource information
- VIEWS:
Not yet available but can imagine that:
- ADMINISTRATOR:
Set up LRAM.
- PROGRAMMER:
Create a co-reservation agent that would gather reservations to run
as soon as possible or at a user-specified time.
- USER:
Input to program desired start time.
- QOS - NETWORK PERFORMANCE:
- GOALS:
- accurate vs. intrusiveness
- scalability
- portability
- fault tolerance
- security
- measurement policy
- data discovery, access and usability
- ARCHITECTURE:
- sensor:
- collects data from resource (e.g. bandwidth, latency)
- collation of data:
- stores data from multiple resources
- access and use of data:
- how users obtain data
- GLOPERF:
- MDS:
- used for storing and accessing data
- gloperf daemon:
- resides on each participating host
- registers itself with the MDS
- will be given membership to a group (determined by hand;
default is local)
- queries MDS for gloperfds in its group
- builds a random list from that group
- does bw and latency test to each peer and writes results into
MDS (uses netperf)
- waits n minutes (default n = 5) before querying MDS again
- does not do predictions
- not intended for networks with highly volatile performance
- VIEWS:
- ADMINISTRATOR:
Set up and maintain gloperfd
- PROGRAMMER:
Build scheduler to query MDS for possible configurations and pick the
best configuration based on network peformance.
- AN EXAMPLE:
NTP (Network Time Protocol) provides a clock synchronization service
to UTC (Universal Coordinated Time). Designated NTP servers obtain their
time from some external time source such as a radio clock.
Say you want to provide access to NTP through Globus, we'll call the
service GNTP. GNTP provides servers which obtain their timing information from
NTP servers. We'll assume that we have been given access to 100
machines to run our servers and elect to run on 10 of these.

Let's build GNTP by adding in Globus services incrementally:
- We first add only resource management
SETUP:
- we obtain a Globus GNTP admin id from the Globus Project
- Globus is running on all our GNTP machines (gatekeepers are running in
trusted mode because we aren't yet considering security)
- we publish the list of NTP and GNTP servers on our web page
CLIENT/SERVER (well, not really) :
- The Globus coallocation service, DUROC, provides a basic
communication mechanism between the processes it launches. Initially,
we can use this to communicate between a GNTP "server" process that
queries the NTP and a GNTP client that wishes to sync the clock on its
system.

We'll call the program that starts the GNTP "server" and GNTP
client, gntp-update.
Usage: gntp-update   GNTP-machine   NTP-server
- Furthermore, let
- ntp-proxy = program which queries NTP server for time and returns the
results to its caller
- Usage: ntp-proxy <server>
- gntp-query = program which queries the coallocated GNTP server for
time and then updates local clock.
- Usage: gntp-query
- since we don't have remote access yet we install ntp-proxy on all
GNTP servers using a non-Globus method such as scp
- given the following request on client.ucsd.edu:
gntp-update gntp.ucla.edu ntp.berkeley.edu
gntp-update will create the following RSL expression to submit to
DUROC.
+(
&(gatekeeper=gntp.ucla.edu)(executable=ntp-proxy)(argument=ntp.berkeley.edu)
&(gatekeeper=client.ucsd.edu)(executable=gntp-query)(# will contact process 0)
)
- Now we add in MDS capabilities (to replace our web page lists)
SETUP:
- store entries for NTP servers in MDS
- store entries for GNTP servers in MDS
CLIENT/SERVER (still not really) :
- modify gntp-update so that it constructs the RSL expression by
querying the MDS to find a GNTP and NTP server. So, the new syntax
for gntp-update is
Usage: gntp-update
- Now we add in security (gatekeepers are now running in secure mode)
SETUP:
- GSI is enabled in all gatekeepers so we now have authentication
- user needs to obtain certificate and logon before using gntp-update
- Now we add in communication using Nexus
CLIENT/SERVER (really) :
- processes can now communicate even though they are started independently
- we want to have the ntp-proxy running all the time (because
starting up a job can be expensive); so we'll turn ntp-proxy into a
daemon Each ntp-proxy will locate a NTP server using the MDS.
- we will have a manager, gntp-manager, that creates and monitors
all ntp-proxy daemons on chosen GNTP machines.
- when starting gntp-manager, the admin only has to log on once.
- ntp-proxy must be copied to all GNTP machines
- the gntp-manager uses either DUROC or GRAM client to start
ntp-proxy on an initial set of 10 machines.
&(gatekeeper=gntp.ucla.edu)(executable=ntp-proxy)
.
.
.
- gntp-update now
- locates a ntp-proxy
- uses nexus to obtain timing info from ntp-proxy
- gntp-query is obsolete
- Now we add in remote access (so that ntp-proxy resides in one central
location)
SETUP:
- we start a gass-server on gntp-manager.org
CLIENT/SERVER (still really but a little easier) :
- gass can be used to stage executables ntp-proxy on GNTP machines
(instead of having to manually copy them over)
- &(gatekeeper=gntp.ucla.edu)(executable=https://gntp-manager.org:2342/ntp-proxy)

- Now we add in fault detection
we use hbm to detect ntp-proxy failures
SETUP:
- we want 10 ntp-proxy daemons running at a time (from a set of 100
GNTP machines)
- in MDS, we store whether GNTP machine's ntp-proxy is running or not
ntp-proxy = active | inactive
CLIENT/SERVER (still really but more reliable) :
- when gntp-manager starts up, it randomly chooses 10 hosts to run on
- Each ntp-proxy registers with local hbm monitor.
- A data collector is implemented as part of gntp-manager
- if a failure is detected, gntp-manager marks that host's ntp-proxy as
inactive and marks it as bad for a certain amount of time
(when that time expires, someone can attempt to start up a ntp-proxy
again)
- it looks through MDS to find first host not running ntp-proxy. It
attempts to start up ntp-proxy and if succeeds, changes ntp-proxy's
state to active.

- Adding in QoS, is probably not useful
- GARA could be used to reserve network bandwidth but is not really
practical
- Could use gloperf to locate servers with better connectivity
REFERENCES
Globus Web Page.
Email from discuss@globus.org mailing list.
The Globus Project: A Status Report. I. Foster, C. Kesselman, Proc. IPPS/SPDP '98 Heterogeneous Computing Workshop, pg. 4-18, 1998.
(PS, PDF).
A Quality of Service Architecture that Combines Resource Reservation and Application Adaptation. I. Foster, A. Roy, V. Sander, (Submitted to the 8th International Workshop on Quality of Service 2000). (PS,
PDF).
A Network Performance Tool for Grid Computations. C. Lee, R. Wolski, I. Foster, C. Kesselman, J. Stepanek. (Submitted to Supercomputing '99). (PS, PDF).
Design and Deployment of a National-Scale Authentication Infrastructure. R. Butler, D. Engert, I. Foster, C. Kesselman, S. Tuecke, J. Volmer, V. Welch. (Submitting).
(PDF).
A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy. (Intl Workshop on Quality of Service, 1999).
(PS, PDF).
GASS: A Data Movement and Access Service for Wide Area Computing Systems. J. Bester, I. Foster, C. Kesselman, J. Tedesco, S. Tuecke. (Sixth Workshop on I/O in Parallel and Distributed Systems, May 5, 1999).
(PS,
PDF).
A Security Architecture for Computational Grids. I. Foster, C. Kesselman, G. Tsudik, S. Tuecke, Proc. 5th ACM Conference on Computer
and Communications Security Conference, pg. 83-92, 1998.
(PS, PDF).
A Fault Detection Service for Wide Area Distributed Computations. P. Stelling, I. Foster, C. Kesselman, C.Lee, G. von Laszewski, Proc.
7th IEEE Symp. on High Performance Distributed Computing, 268-278, 1998.
(PS, PDF).
A Resource Management Architecture for Metacomputing Systems. K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke. Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing,1998
(PS, PDF).
Usage of LDAP in Globus. I. Foster, G. von Laszewski. (PS, PDF)
A Directory Service for Configuring High-Performance
Distributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symp. on High-Performance Distributed
Computing, pg. 365-375, 1997.
(PS, PDF)
Managing Multiple Communication Methods in
High-Performance Networked Computing Systems. I. Foster, J. Geisler, C. Kesselman, S. Tuecke. J. Parallel and Distributed Computing, 40:35-48, 1997.
(PS,
PDF).
The Nexus Approach to Integrating Multithreading and Communication. I. Foster, C. Kesselman, S. Tuecke, J. Journal of Parallel
and Distributed Computing, 37:70--82, 1996.
(PS, PDF).