ACM Computing Surveys 28A(4), December 1996, http://www-cse.ucsd.edu/~pasquale/SDCR96-IO/MuntzR-PasqualeJ.html. Copyright © 1996 by the Association for Computing Machinery, Inc. See the permissions statement below.
Abstract: I/O systems are becoming more complex, and must be designed by considering the entire system, end-to-end. We make a number of recommendations to address this problem, including the following. (1) There needs to be more emphasis on tertiary storage and on the whole (multilevel) storage hierarchy in general. (2) We must pay more attention to issues of resource management and availability in the network, especially if network-attached storage devices become more viable. (3) To improve performance, the operating system must give user-level processes more control over the data path between the storage device and the process, or be able to accept and exploit high-level hints about the application's behavior and its most important performance metrics/quality of service. (4) Finally, more emphasis should be placed on content-based or semantic-based compression, where we believe the greatest advances remain ahead of us. See also the citation page [Muntz Pasquale 1996] for this position statement.Categories and Subject Descriptors: D.4.2 [Operating Systems]: Storage Management - storage hierarchies; D.4.4 [Operating Systems]: Communications Management - input/output, network communication; B.4.2 [Input/Output and Data Communications]: Input/Output Devices - disks, channels and controllers; E.4 [Data]: Coding and Information Theory - data compaction and compression;
General Terms: Algorithms, Design, Management, Measurement, Performance.
Additional Key Words and Phrases: I/O, communication.
I/O can no longer be viewed from the point perspective what happens between an I/O device and the machine it is connected to. I/O systems are becoming more complex, and must be designed by considering the entire system, end-to-end. For example, storage systems are themselves distributed systems, comprised of a hierarchy of storage devices of different speeds and sizes and connected by (different types of) networks.
Recommendations:
Operating systems are getting more and more "in the way" between the user and the storage system. The buffering and caching done by the OS may actually be detrimental to performance.
Ultimately, the application (or, more typically, a server or middleware) should be provided with more control over low-level functions and let it do what it thinks is best. A good example is the old story about letting database systems control their own buffering because they know best how to do it, rather than the OS try to do it.
Recommendations:
One especially important and interesting research challenge is how to best support a mixture of application workloads. One version of the question is how to build one storage system (or to what extent one can) which can be statically configured for each workload type. (In other words the goal of this approach is limited to achieving software reuse.) Another, harder, version of the question is how to build a storage subsystem that can concurrently support a mixed workload of applications; addressing their individual requirements (throughput, latency, jitter, etc.) without a priori partitioning of resources. Most work to date is quite restrictive and assumes that there is only one class of workload present. There are a few exceptions such as some designs of VOD systems in which some consideration has been given to including non-realtime workload, but these are scarse.
How will future data compression techniques influence I/O and vice versa (in addition to simply reducing bandwidth and storage requirements). For example, lossy compression is influencing I/O in requiring variable retrieval rates. What about in the future? What will be the effects of important schemes currently being researched like content (or object) based compression rather than the more common pixel based compression (like JPEG). How will this influence (if at all) storage and retrieval?
Recommendations:
Finally, researchers need a better understanding of where device technology is likely to be going and how that will affect I/O problems 5 years and 10 years from now. Will holographic memories make it? What exactly will they look like? Will semiconductor memories overtake disk in price per MB? Will that make solid state disks the dominant secondary storage device? For DVD technology, what will be the price tradeoffs? Will the technology stay at a plateau for 20 years after reaching the "blue light special" capacity of 40GB and a MB/sec bandwidth?
Recommendations: