“All problems in computer science can be solved by another level of indirection” – often attributed to Butler Lampson, who gives credit to David Wheeler
“... but that usually will create another problem” – David Wheeler
However, multiple layers of indirection can result in excess indirection, causing performance overhead and memory consumption. For example, running a file system on top of a virtualized device creates two levels of (excess) virtualization; a block is first mapped from a file offset to its logical address and then from the logical address to its physical address in the device.
De-indirection is a method to remove excess level of indirection. We propose several techniques to perform de-indirection in flash-based SSDs.
Nameless Writes are a new device interface that removes the need for indirection in modern solid-state storage devices (SSDs). Nameless writes allow the device to choose the location of a write; only then is the client informed of the name (i.e., address) where the block now resides. Doing so allows the device to control block-allocation decisions, thus enabling it to execute critical tasks such as garbage collection and wear leveling, while removing the need for large and costly indirection tables. The resulting system significantly reduces space and time overheads and improves SSD performance over current solutions, thus making for simpler, less costly, and higher-performance SSD-based storage.
A large body of research work on flash-based SSDs has been conducted in the recent years. Most work uses simulation as an evaluation method. Simulation is easy to use but lacks real system interaction and cannot be used to run real workloads or applications directly.
FlashEm is a real-time flash-based SSD emulator. We expose the emulated device to the system as a block device, so it can run real workloads and applications. We used several techniques to reduce the computational overhead of FlashEm. Our evaluation results show that it is accurate to use FlashEm to evaluate important metrics with common types of flash-based SSDs.
Due to the closed nature of commercial devices, most research on SSDs is performed using simulation or emulation. Without implementation in real devices, it can be difficult to judge the true benefit of the proposed designs. With hardware prototyping, we can evaluate both the difficulty in implementing new designs and the performance of such designs. We built a hardware prototype of the Nameless Write SSD using the OpenSSD Jasmine board. While the flash-translation layer changes were straightforward, we discovered unexpected complexities in implementing extensions to the storage interface.
The File System De-Virtualizer (FSDV) is a system to dynamically remove a layer of indirection common in modern storage stacks, thus decreasing the indirection space and performance costs. FSDV is a light-weight and flexible tool that de-virtualizes data by changing file system pointers to use device physical addresses. When FSDV is not running, the file system and the device both maintain their virtualization layers and perform normal I/O operations. We implemented FSDV with the ext3 file system and an emulated flash-based SSDs.
Our evaluation results show that FSDV can dynamically reduce indirection mapping table space while preserving the foreground I/O performance. We also demonstrate that with our design of FSDV, the changes needed in file system, flash devices, and device interface are small.
Storage data caching is important to provide good application performance. With the fast growth of storage data size and the emerging of new memory technologies, storage-level caches are becoming bigger and can be persistent. We explore problems of large-scale storage-level caches in system design.
Large caches in storage servers have become essential for meeting service levels required by applications. These caches need to be warmed with data often today due to various scenarios including dynamic creation of cache space and server restarts that clear cache contents. Our analysis of real-world data-center traces shows that large storage cache can take hours or even days to warm up, affecting both application performance and server load over a long period of time.
Bonfire is a system for accelerating cache warmup. Bonfire monitors storage server workloads, logs important warmup data, and efficiently preloads storage-level caches with warmup data. We show through both simulation and trace replay that Bonfire reduces both warmup time and backend server load significantly, compared to a cache that is warmed up on demand.
FlashTier is a system architecture built upon solid-state cache (SSC), a flash device with an interface designed for caching. The FlashTier design addresses three limitations of using traditional SSDs for caching. First, FlashTier provides a unified logical address space to reduce the cost of cache block management within both the OS and the SSD. Second, FlashTier provides cache consistency guarantees allowing the cached data to be used following a crash. Finally, FlashTier leverages cache behavior to silently evict data blocks during garbage collection to improve performance of the SSC.
Redundant arrays (RAID) technologies are widely used to provide data reliability in the face of device failure. With new types of storage devices and environments, there are new challenges in deploying RAID technologies. We explore problems with redundant arrays in two different environments.
All standard RAID approaches assume that devices do not wear out, and hence distribute work equally among them. Unfortunately, for flash, this approach is not appropriate as the life of flash cell depends on the number of times it is written and cleaned. Hence, identical write patterns to mirrored flash drives introduce a failure dependency in the storage system, increasing the probability of concurrent device failure and hence data loss.
Warped Mirrors are a solution to this endurance problem for mirrored flash devices. By carefully inducing a slight imbalance into write traffic across devices, we intentionally increase the workload of one device in the mirror pair, and thus increase the odds that it will fail first. Thus, with our approach, device failure independence is preserved.
Primary storage systems contain only moderate degree of duplication and do not benefit greatly from deduplication. Instead of removing duplicates, we propose a duplication aware disk array (DADA), which recognizes and keeps duplicate contents on disks and make use of its knowledge of duplicates to improve the reliability and availability of disk arrays in various ways. First, DADA can be used to reduce the time spent on disk scrubbing to detect latent sector errors (LSE). Second, DADA reduces RAID reconstruction time during recovery. Finally, DADA provide better reliability than a default RAID array using its knowledge of data duplication.
Removing the Costs and Retaining the Benefits of Flash-Based SSD Virtualization with FSDV
Yiying Zhang,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Proceedings of the 31st IEEE Conference on Massive Data Storage (MSST '15)
De-indirection for Flash-based Solid State Drives
Yiying Zhang
Ph.D. Thesis, University of Wisconsin-Madison
De-indirection for Flash-based SSDs with Nameless Writes
Yiying Zhang,
Leo Prasath Arulraj,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Proceedings of the 10th Conference on File and Storage
Technologies (FAST'12)
Getting Real: Lessons in Transitioning Research Simulations into Hardware Systems
Mohit Saxena,
Yiying Zhang,
Michael M. Swift,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Proceedings of the 11th Conference on File and Storage
Technologies (FAST '13)
Warming up Storage-Level Caches with Bonfire
Yiying Zhang,
Gokul Soundararajan,
Mark W. Storer,
Lakshmi N. Bairavasundaram,
Sethuraman Subbiah,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Proceedings of the 11th Conference on File and Storage
Technologies (FAST '13)
FlashTier: a Lightweight, Consistent and Durable Storage Cache
Mohit Saxena,
Michael M. Swift,
Yiying Zhang
Proceedings of the 7th European Conference on Computer Systems
(EuroSys'12)
System and Method for an Efficient Cache Warm-up
Lakshmi N. Bairavasundaram,
Gokul Soundararajan,
Mark W. Storer,
Yiying Zhang
US Patent WO2014100253 A1
Warped Mirrors for Flash
Yiying Zhang,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Proceedings of the 29th IEEE Conference on Massive Data Storage (MSST '13)
Duplicate-aware Disk Arrays
Vijayan Prabhakaran,
Yiying Zhang
US Patent #20120226936
DADA: Duplication Aware Disk Array
Yiying Zhang,
Vijayan Prabhakaran
Poster at the 9th Conference on File and Storage Technologies (FAST 2011)