Our most cited papers (as of September 2017 according to Google Scholar) are as follows:
Abstract: We investigate multicast routing for high-bandwidth delay-sensitive applications in a point-to-point network as an optimization problem. We associate an edge cost and an edge delay with each edge in the network. The problem is to construct a tree spanning the destination nodes, such that it has the least cost, and so that the delay on the path from source to each destination is bounded. We show that this problem of computing such a constrained multicast tree is NP-complete. We also provide heuristics that demonstrate good average case behavior in terms of cost, as determined through simulations on a large number of graphs.
Main contribution: The first algorithms (centralized and distributed) for determining multicast routes that simultaneously minimize bandwidth and meet a delay bound (delay-constrained minimal Steiner trees).
Abstract: We present detailed measurements of various processing overheads of the TCP/IP and UDP/IP protocol stacks on a DECstation 5000/200 running the Ultrix 4.2a operating system. These overheads include data-touching operations, such as the checksum computation and data movement which are well known to be major time consumers. We also considered overheads due to non-data touching operations, such as network buffer manipulation, protocol-specific processing, operating system functions, data structure manipulations (other than network buffers), and error checking. The performance results were used to uncover throughput and latency bottlenecks. We present a scheme for improving throughput when sending large messages by avoiding most checksum computations in a relatively safe manner. We also show that for the implementation we studied, reducing latency (when sending small messages) is a more difficult problem because processing overheads are spread over many operations; gaining a significant savings would require the optimization of many different mechanisms. This is especially important because, when one considers realistic message size distributions, where the majority of messages are small, the cumulative time consumed by the nondata touching overheads represents the majority of processing time.
Main contribution: The first detailed performance measurement studies of widely-deployed TCP/IP implementations that showed the potentials and limitations of various software optimizations as they relate to delay vs. throughput, and thus which are worth implementing and which are not.
Abstract: We prove that a connection composed of Virtual-Clock servers provides an upper bound on delay for leaky bucket constrained sessions, i.e., sessions conforming to a token bucket filter. This upper bound on delay is calculated, and it is the same upper bound on delay given by PGPS. We also prove that leaky bucket constrained sessions are the only type of sessions for which an upper bound on delay can be provided by servers with an upper bound on link capacity.
Main contribution: The first proof that VirtualClock supports an end-to-end delay bound, solving a 6-year open problem.
Abstract: Stratified Round Robin is a fair-queueing packet scheduler which has good fairness and delay properties, and low quasi-O(1) complexity. It is unique among all other schedulers of comparable complexity in that it provides a single packet delay bound that is independent of the number of flows. Importantly, it is also amenable to a simple hardware implementation, and thus fills a current gap between scheduling algorithms that have provably good performance and those that are feasible and practical to implement in high-speed routers. We present both analytical results and simulations to demonstrate its performance properties.
Main contribution: First packet scheduling algorithm to provide all three of the following properties simultanouesly: strong fairness, bounded delay, low quasi-constant complexity (making it uniquely amenable to highly efficient and practical implementations).
Abstract: File system I/O is increasingly becoming a performance bottleneck in large distributed computer systems. This is due to the increased file I/O demands of new applications, the inability of any single storage structure to respond to these demands, and the slow decline of disk access times (latency and seek) relative to the rapid increase in CPU speeds, memory size, and network bandwidth. We present a multi-structured file system designed for high bandwidth I/O and fast response. Our design is based on combining disk caching with three different file storage structures, each implemented on an independent and isolated disk array. Each storage structure is designed to optimize a different set of file system access characteristics such as cache writes, directory searches, file attribute requests or large sequential reads/writes. As part of our study, we analyze the performance of an existing file system using trace data from UNIX disk I/O-intensive workloads. Using trace driven simulations, we show how performance is improved by using separate storage structures as implemented by a multi-structured file system.
Main contribution: The first file system to demonstrate the value of using separate storage structures (a combination of striped, mirrored, log-structured, and in-memory caches) for large data objects (e.g., multimedia) and their meta-data to achieve high performance and good reliability.
Abstract: New I/O devices with data rates ranging from 10 to 100 Mbytes per second are becoming available for personal computers and workstations. These include human-interaction devices for video capture and display (and audio record and playback), high-capacity storage devices, and high-speed network communication devices. These devices have enabled I/O-intensive applications for desktop computing that require input, processing, and output of very large amounts of data. This article focuses on an important aspect of operating system support for these applications: efficient transfer of large data objects between the protection domains in which processes and devices reside. Many operating systems are inefficient in transferring large amounts of data between domains. Most of them, even when they try to avoid physical copying, offer a data-transfer model that assumes a need for complete accessibility to all transferred data. This assumption leads to overheads that can otherwise be avoided. To solve this, we present an interdomain transfer facility (inspired by the "container shipping" solution from the cargo transportation industry) that is based on virtual transfers and avoids all unnecessary physical copying.
Main contribution: Addressed the critical problem of moving high volume data at high rates (e.g., video) between devices and processes, and showed how to achieve this with operating system abstractions that can be implemented efficiently (minimizing memory copying).
Abstract: This paper explores the problems associated with the multicasting of continuous media to support multimedia group applications. The interaction between multicasting and the delivery of multiple time-correlated continuous media streams with real-time delay requirements poses various new and interesting problems in research on communication protocols and architectures. We describe these problems, and identify where the opportunities are for effective solutions, all in the context of providing an overview of the current state of research in multimedia multicasting. The issues we discuss include quality of service, resource reservations, routing, error and traffic control, heterogeneity, and the use of hierarchical coding and open-loop control techniques.
Main contribution: The first paper that surveyed the multimedia multicasting problem, precisely framing the issues, showing what the problems are and why they are difficult, and presenting promising research opportunities.
Abstract: We describe the concept of the relocatable continuous media filter within the context of a new dissemination-oriented communication abstraction. The novelty of these filters is how they can propagate over a dissemination tree in a network. We describe the filter propagation protocol to achieve this. Execution of filters inside a network allows the network to be viewed in a novel way, as a "processor" with its "instruction set" being the various types of available filters. Since filters generally modify the data rate of the continuous media stream, usually (but not necessarily) reducing it, filters allow the trading off of bandwidth and processing in a network.
Main contribution: Forerunner of the active networks concept, proposing that the network act as a distributed processor, and more specifically, how a multimedia multicast network can be programmed to optimally filter media streams.
Abstract: We address the problem of using replication to reliably maintain state in a distributed system for time spans that far exceed the lifetimes of individual replicas. This scenario is relevant for any system comprised of a potentially large and selectable number of replicated components, each of which may be highly unreliable, where the goal is to have enough replicas to keep the system "alive" (meaning at least one replica is working or available) for a certain expected period of time, i.e., the system's lifetime. In particular, this applies to recent efforts to build highly available storage systems based on the peer-to-peer paradigm. We model notions of replica loss and replica repair in such systems by a simple Markov chain model, and derive an expression for the lifetime of the replicated state. We then apply this model to study the impact of practical considerations like storage and bandwidth limits on the system, and describe methods to optimally choose system parameters so as to maximize lifetime. Our analysis sheds light on the efficacy of various replication strategies.
Main contribution: Answers the fundamental question: if I want a file to live for a certain amount of time, how many copies of it must I make (and re-make upon failure), given that storage nodes fail at a certain rate?
Abstract: Leave-in-Time is a new rate-based service discipline for packet-switching nodes in a connection-oriented data network. Leave-in-Time provides sessions with upper bounds on end-to-end delay, delay jitter, buffer space requirements, and an upper bound on the probability distribution of end-to-end delays. A Leave-in-Time session's guarantees are completely determined by the dynamic traffic behavior of that session, without influence from other sessions. This results in the desirable property that these guarantees are expressed as functions derivable simply from a single fixed-rate server (with rate equal to the session's reserved rate) serving only that session. Leave-in-Time has a non-work-conserving mode of operation for sessions desiring low end-to-end delay jitter. Finally, Leave-in-Time supports the notion of delay shifting, whereby the delay bounds of some sessions may be decreased at the expense of increasing those of other sessions. We present a set of admission control algorithms which support the ability to do delay shifting in a systematic way.
Main contribution: The first service discipline to use VirtualClock at its core and exploit its end-to-end delay bound property.
Abstract: We present the motivation, design, implementation, and performance evaluation of a UNIX kernel mechanism capable of establishing fast in-kernel data pathways between I/O objects. A new system call, splice() moves data asynchronously and without user-process intervention to and from I/O objects specified by file descriptors. Performance measurements indicate improved I/O throughput and increased CPU availability attributable to data copying and context switch overhead.
Main contribution: A new abstraction for connecting input and output devices that can be implemented with data paths that need not cross the user/kernel protection boundary, leading to significant performance gains.