Introduction to Grid
Computing
References: Grid Book, Chapters 1, 2, 22
1. What is Grid Computing?
Computational Grid is a
collection of distributed, possibly heterogeneous resources which can be
used as an ensemble to execute large-scale applications
-
Computational Grid also called metacomputer
-
Term computational grid
comes from an analogy with the electric power grid:
-
Electric power is ubiquitous
-
Don't need to know the source (transformer, generator) of the power or
the power company that serves it
Ever-present search for cycles in HPC. Two
foci of research
-
In the box parallel computers, as evidenced by the PetaFLOPS initiative
-
Increasing development of infrastructure and middleware to leverage the
performance potential of distributed Computational Grids
Grid applications include
Distributed Supercomputing
-
Distributed Supercomputing applications couple multiple computational resources
- supercomputers and/or workstations
-
Distributed supercomputing applications include SFExpress (large-scale
modeling of battle entities with complex interactive behavior for distrtibuted
interactive simulation), Climate Modeling (modeling of climate behavior
using complex models and long time-scales)
High-Throughput Applications
-
Grid used to schedule large numbers of independent or loosely coupled tasks
with the goal of putting unused cycles to work
-
High-throughput applications include RSA keycracking, seti@home
(detection of extra-terrestrial communication)
Data-Intensive Applications
-
Focus is on synthesizing new information from large amounts of physically
distributed data
-
Examples include NILE (distributed system for high energy physics experiments
using data from CLEO), SAR/SRB applications, digital library applications
2. Early Experiences with Grid Computing
Gigabit Testbeds Program
-
Late 80's, early 90's, gigabit testbed program was developed as joint NSF,
DARPA, CNRI (Corporation for Networking Research, Bob Kahn) initiative
-
Idea was to investigate potential architecture for a gigabit/sec network
testbed and to explore usefulness for end-users
-
5 testbeds formed: CASA (southwest), MAGIC and BLANCA (Midwest), AURORA
and NECTAR (northeast), VISTANET (southeast), each had a unique blend of
research in applications and in networking and computer science research:
Testbed
|
Applications
|
Network
|
CASA
|
Distributed Supercomputing
|
HIPPI switches connected
by HIPPI-over-SONET at OC-12
|
BLANCA
|
Virtual Environments, Remote
visualization and steering, multimedia digital libraries
|
Experimental ATM
switches running over experimental 622 Mb/s and 45 Mb/s circuits developed
by AT&T and universities
|
VISTANET
|
Radiation treatment planning
applications involving supercomputer, remote instrument (radiation beam)
and visualization
|
ATM network at OC-12 (622
Mb/s) interconnecting HIPPI local area networks
|
NECTAR
|
Coupled supercomputers
running chemical reaction dynamics and CS research
|
OC-48 (2.4 Gb/s) links between PSC supercomputer
facility and CMU (metropolitan area testbed)
|
AURORA
|
Telerobotics, distributed
virtual memory and operating system research
|
OC-12 network interconnecting
4 research sites and supporting the development of ATM host interfaces,
ATM switches and network protocols.
|
MAGIC
|
Remote vehicle control
applications and high-speed access to databases for terrain visualization
and battle simulation
|
OC-12 network to interconnect ATM-attached hosts
|
I-Way
-
First large-scale Grid experiment
-
Put together for SC'95
-
I-Way consisted of a Grid of 17 sites connected by vBNS
-
Over 60 applications ran on the I-WAY during SC?95
-
Each I-WAY site served by an I-POP (I-WAY Point of Presence) used for authentication
of distributed applications, distribution of associated libraries and other
software, and monitoring the connectivity of the I-WAY virtual network
-
Users could use single authentication and job submission across multiple
sites or they could work directly with end-users
-
Scheduling done with a human-in-the-loop
PACIs
-
2 NSF Supercomputer Centers (PACIs) - SDSC/NPACI and NCSA/Alliance, both
committed to Grid computing although the effort has been stronger at NCSA
-
vBNS backbone between NCSA and SDSC running at OC-12 with connectivity
to over 100 locations at speeds ranging from 45 Mb/s to 155 Mb/s or more
-
Applications include data-intensive computing (NPACI), visual supercomputing
and teleimmersion (Alliance).
-
Access Grid by NCSA serves to connect sites for collaboration work
in distributed environments and group interactions
Other Efforts
-
Globus testbed = GUSTO which supports Globus infrastructure and application
development
-
Centurion Cluster at UVA = Legion testbed
-
IPG = supported by NASA as grid computing testbed, Globus is supported
as infrastructure and application and middleware development efforts are
underway
3. What is the difference
between Grid Computing, Cluster Computing and the Web?
Cluster computing focuses
on platforms consisting of often homogeneous interconnected nodes in a
single administrative domain.
-
Clusters often consist of PCs or workstations and relatively fast networks
-
Cluster components can be shared or dedicated
-
Application focus is on cycle-stealing computations, high-throughput computations,
distributed computations
Web
focuses on platforms consisting
of any combination of resources and networks which support naming services,
protocols, search engines, etc.
-
Web consists of very diverse set of computational, storage, communication,
and other resources shared by an immense number of users
-
Application focus is on access to information, electronic commerce, etc.
Grid focus on ensembles
of distributed heterogeneous resources used as a platform for high performance
computing.
-
Some grid resources may be shared, other may be dedicated or reserved
-
Application focus is on high-performance, resource-intensive applications
4. State-of-the-art Grid Infrastructure:
Globus and Legion
Legion and Globus are the two best-known infrastructure efforts.
Globus - integrated
toolkit of Grid services.
-
Developed by Ian Foster (ANL/UC) and Carl Kesselman (USC/ISI)
-
Bag of services model - applications can use Grid services
without having to adopt a particular programming model
-
Globus services include :
-
Resource allocation and process management (GRAM)
-
Communication services (Nexus)
-
Distributed access to structure and state information (MDS)
-
Authentication and security services (GSI)
-
System monitoring (HBM)
-
Remote data access (GASS)
-
Construction, caching and location of executables (GEM)
Legion - Developed by
Andrew Grimshaw (UVA)
-
Provides single, coherent virtual machine model that addresses grid issues
within a reflective, object-based metasystem
-
Everything is an object in Legion model - HW resources, SW resources,
etc.
-
Every Legion object is defined and managed by its class object; class objects
act as managers and make policy, as well as define instances
-
Legion defines the interface and basic functionality of a set of core object
types which support basic services
-
Users may also define and build their own class objects