Introduction to Grid Computing

Introduction to Grid Computing

References: Grid Book, Chapters 1, 2, 22

1. What is Grid Computing?

Computational Grid is a collection of distributed, possibly heterogeneous resources which can be used as an ensemble to execute large-scale applications

Computational Grid also called metacomputer

Term computational grid comes from an analogy with the electric power grid:

Electric power is ubiquitous
Don't need to know the source (transformer, generator) of the power or the power company that serves it

Ever-present search for cycles in HPC. Two foci of research

In the box parallel computers, as evidenced by the PetaFLOPS initiative
Increasing development of infrastructure and middleware to leverage the performance potential of distributed Computational Grids

Grid applications include

Distributed Supercomputing

Distributed Supercomputing applications couple multiple computational resources - supercomputers and/or workstations

Distributed supercomputing applications include SFExpress (large-scale modeling of battle entities with complex interactive behavior for distrtibuted interactive simulation), Climate Modeling (modeling of climate behavior using complex models and long time-scales)

High-Throughput Applications

Grid used to schedule large numbers of independent or loosely coupled tasks with the goal of putting unused cycles to work

High-throughput applications include RSA keycracking, seti@home (detection of extra-terrestrial communication)

Data-Intensive Applications

Focus is on synthesizing new information from large amounts of physically distributed data

Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications, digital library applications

2. Early Experiences with Grid Computing

Gigabit Testbeds Program

Late 80's, early 90's, gigabit testbed program was developed as joint NSF, DARPA, CNRI (Corporation for Networking Research, Bob Kahn) initiative
Idea was to investigate potential architecture for a gigabit/sec network testbed and to explore usefulness for end-users
5 testbeds formed: CASA (southwest), MAGIC and BLANCA (Midwest), AURORA and NECTAR (northeast), VISTANET (southeast), each had a unique blend of research in applications and in networking and computer science research:

Testbed	Applications	Network
CASA	Distributed Supercomputing	HIPPI switches connected by HIPPI-over-SONET at OC-12
BLANCA	Virtual Environments, Remote visualization and steering, multimedia digital libraries	Experimental ATM switches running over experimental 622 Mb/s and 45 Mb/s circuits developed by AT&T and universities
VISTANET	Radiation treatment planning applications involving supercomputer, remote instrument (radiation beam) and visualization	ATM network at OC-12 (622 Mb/s) interconnecting HIPPI local area networks
NECTAR	Coupled supercomputers running chemical reaction dynamics and CS research	OC-48 (2.4 Gb/s) links between PSC supercomputer facility and CMU (metropolitan area testbed)
AURORA	Telerobotics, distributed virtual memory and operating system research	OC-12 network interconnecting 4 research sites and supporting the development of ATM host interfaces, ATM switches and network protocols.
MAGIC	Remote vehicle control applications and high-speed access to databases for terrain visualization and battle simulation	OC-12 network to interconnect ATM-attached hosts

I-Way

First large-scale Grid experiment
Put together for SC'95
I-Way consisted of a Grid of 17 sites connected by vBNS
Over 60 applications ran on the I-WAY during SC?95
Each I-WAY site served by an I-POP (I-WAY Point of Presence) used for authentication of distributed applications, distribution of associated libraries and other software, and monitoring the connectivity of the I-WAY virtual network
Users could use single authentication and job submission across multiple sites or they could work directly with end-users
Scheduling done with a human-in-the-loop

PACIs

2 NSF Supercomputer Centers (PACIs) - SDSC/NPACI and NCSA/Alliance, both committed to Grid computing although the effort has been stronger at NCSA
vBNS backbone between NCSA and SDSC running at OC-12 with connectivity to over 100 locations at speeds ranging from 45 Mb/s to 155 Mb/s or more
Applications include data-intensive computing (NPACI), visual supercomputing and teleimmersion (Alliance).
Access Grid by NCSA serves to connect sites for collaboration work in distributed environments and group interactions

Other Efforts

Globus testbed = GUSTO which supports Globus infrastructure and application development
Centurion Cluster at UVA = Legion testbed
IPG = supported by NASA as grid computing testbed, Globus is supported as infrastructure and application and middleware development efforts are underway

3. What is the difference between Grid Computing, Cluster Computing and the Web?

Cluster computing focuses on platforms consisting of often homogeneous interconnected nodes in a single administrative domain.

Clusters often consist of PCs or workstations and relatively fast networks
Cluster components can be shared or dedicated
Application focus is on cycle-stealing computations, high-throughput computations, distributed computations

Web focuses on platforms consisting of any combination of resources and networks which support naming services, protocols, search engines, etc.

Web consists of very diverse set of computational, storage, communication, and other resources shared by an immense number of users
Application focus is on access to information, electronic commerce, etc.

Grid focus on ensembles of distributed heterogeneous resources used as a platform for high performance computing.

Some grid resources may be shared, other may be dedicated or reserved
Application focus is on high-performance, resource-intensive applications

4. State-of-the-art Grid Infrastructure: Globus and Legion

Legion and Globus are the two best-known infrastructure efforts.

Globus - integrated toolkit of Grid services.

Developed by Ian Foster (ANL/UC) and Carl Kesselman (USC/ISI)
Bag of services model - applications can use Grid services without having to adopt a particular programming model
Globus services include :

Resource allocation and process management (GRAM)
Communication services (Nexus)
Distributed access to structure and state information (MDS)
Authentication and security services (GSI)
System monitoring (HBM)
Remote data access (GASS)
Construction, caching and location of executables (GEM)

Legion - Developed by Andrew Grimshaw (UVA)

Provides single, coherent virtual machine model that addresses grid issues within a reflective, object-based metasystem
Everything is an object in Legion model - HW resources, SW resources, etc.
Every Legion object is defined and managed by its class object; class objects act as managers and make policy, as well as define instances
Legion defines the interface and basic functionality of a set of core object types which support basic services
Users may also define and build their own class objects