Providing a ubiquitous service that can both track dynamic performance changes and remain stable in spite of them requires adaptive programming techniques, an architectural design that supports extensibility, and internal abstractions that can be implemented efficiently and portably. In this paper, we describe the current implementation of the Network Weather Service for Unix and TCP/IP sockets and provide examples of its performance monitoring and forecasting capabilities.
In this paper, we focus on the problem of making short and medium term forecasts of CPU availability on time-shared Unix systems. We evaluate the accuracy with which availability can be measured using Unix load average, the Unix utility \verb+vmstat+, and the Network Weather Service CPU sensor that uses both. We also examine the autocorrelation between successive CPU measurements to determine their degree of self-similarity. While our observations show a long-range autocorrelation dependence, we demonstrate how this dependence manifests itself in the short and medium term predictability of the CPU resources in our study.
Dynamically Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service (short version) (compressed postscript) Rich Wolski, in the Proceedings of the 6th High-Performance Distributed Computing Conference, August, 1997.
The Network Weather Service is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments. In this paper, we outline its design and detail the predictive performance of the forecasts it generates. While the forecasting methods are general, we focus on their ability to predict the TCP/IP end-to-end throughput and latency that is attainable by an application using systems located at different sites. Such network forecasts are needed both to support scheduling, and by the metacomputing software infrastructure to develop quality-of-service guarantees.
We describe the architecture of the Network Weather Service and implementations that we have developed and are currently deploying for the Legion and Globus/Nexus metacomputing infrastructures. We also detail NWS forecasts of resource performance using both the Legion and Globus/Nexus implementations. Our results show that simple forecasting techniques substantially outperform measurements of current conditions (commonly used to gauge resource availability and load) in terms of prediction accuracy.
This paper investigates the efficacy of Application-Level Scheduling (AppLeS) for a parallel gene sequence library comparison application in production metacomputing settings. We compare an AppLeS-enhanced version of the application to an original implementation designed and tuned to use the native scheduling mechanisms of Mentat -- a metacomputing software infrastructure. The experimental data shows that the AppLeS versions outperform the best Mentat versions over a range of problem sizes and computational settings.
In this paper, we define a set of principles underlying application-level scheduling and describe our work-in-progress building AppLeS (application-level scheduling) agents. We illustrate the application-level scheduling approach with a detailed description and results for a distributed 2D Jacobi application on 2 heterogeneous platforms.
While running on parallel distributed resources, schedulers may find it advantageous to redistribute elements of a computation in response to changing conditions. In this paper, we focus on the development of dynamically parametrizable models to determine the cost (in terms of execution delay) of performing redistribution. We illustrate our approach by examining in detail the modeling of redistribution costs for a 2D Jacobi application running in a cluster of workstations environment.
We focus om the problems of scheduling applications on metacomputing systems. We intoduce the concept of application-centric scheduling in which everything about the system is evaluated in terms of its impact on the application. Application-centric scheduling is used by virtually all metacomputer programmers to achieve performance on metacomputing systems. We describe two successful metacomputing appli and Panoramacations to illustrate this approach, and describe AppLeS scheduling agents which generalize the application-centric scheduling approach. Finally, we show preliminary results which compare AppLeS-dervied schedules with conventional strip and blocked schedules for a two-dimensional Jacobi code.
Many parallel compilation systems represent programs internally as Directed Acyclic Graphs (DAGs). However, the storage of these DAGs becomes prohibitive when the program being compiled is large. In this paper we describe a compile-time scheduling methodology for hierarchical DAG programs represented in the IFX intermediate form. The method we present is itself hierarchical reducing the storage that would otherwise be required by a single flat DAG representation. We describe the scheduling model and demonstrate the method using the Optimizing Sisal Compiler and two scientific applications.
We describe Zoom, a hierarchical representation in which heterogeneous applications can be described. The goal of Zoom is to provide an abstraction that computer and computational scientists can use to describe heterogeneous applications, and to provide a foundation from which program development tools for heterogeneous network computing can be built. Three levels (structure, implementation and data) of the Zoom hierarchy are described and are used to illustrate two heterogeneous applications. Extensions to Zoom to include additional resource parameters required by program development tools are also discussed.
We couple the Zoom representation designed to facilitate development of heterogeneous applications, and the HeNCE graphical language and tool, designed as a representation for and an executional model of heterogeneous programs targeted to PVM. The combination of Zoom and HeNCE provides a hierarchical representation which exposes performance issues and a means of automatically translating that representation into code executable on heterogeneous networks of computers.
The cost of hardware cache-coherence, both in terms of execution delay and operational cost, is substantial for scalable systems. Fortunately, compiler generated cache management can reduce program serialization due to cache-contention and increase execution performance. It can also reduce the cost of parallel systems by eliminating the need for more expensive hardware support. In this paper, we use Sisal functional language system as a vehicle to implement and investigate automatic, compiler based cache management. We describe our implementation of Sisal for the IBM Power/4. The Power/4, briefly available as a product, represents an early attempt to build a shared-memory machine that relies strictly on the language system for cache-coherence. We discuss the issues associated with deterministic execution and program correctness on a system without hardware coherence, and demonstrate how Sisal (as a functional language) is able to address those issues.
Compiler Enforced Cache Coherence Using a Functional Language Rich Wolski and David Cann, Journal of Scientific Programming, December, 1995.
A revised version of the conference paper whcih includes a discussion of imperative compilation techniques for the Power/4.
Programming languages are the most important tool at a programmer's disposal. All other tools correct, visualize, or evaluate the product crafted by this tool. The advent of multiprocessor computer systems has greatly complicated the programmer's task and increased his need for high-level languages capable of automatically taming these architectures. In this paper, we describe a prototype implementation of Sisal for multiprocessor, hierarchical-memory systems. The implementation includes explicit compiler and runtime control that effectively exploits the different levels of memory and manages interprocess communications (IPC). We give preliminary performance results for this system on the BBN TC2000.