Evaluations: 06/01

Bianca Zadrozny (zadrozny@cs.ucsd.edu)
Thu, 1 Jun 2000 01:43:07 -0700

A Fast File System for UNIX

-----------------------------

This paper describes a reimplementation of the UNIX file system with improved throughput rates, which is called Fast File System (FFS). This improvement is achieved by clustering related information together and increasing the block size, which minimizes seek latency. In order to avoid wasted space that results from internal fragmentation due to the large block size, the system allows the division of a single block into fixed-size fragments. These fragments can be used for storing small files or small remainders of bigger files.

The system uses information about the characteristics of both the processor and the disk to lay out blocks on rotationally optimal positions. Allocation policies calculate the number of blocks to skip over between sequential blocks of the same file, so that the time for starting a new disk transfer operation matches the time the next block takes to be under the disk head. In this way, the amount of time spent waiting for the disk to position itself is minimized.

FFS divides a disk partition into areas called cylinder groups, which are comprised of one or more consecutive cylinders on a disk. In order to improve locality of reference, the layout policy tries to place all the inodes of files in a directory in the same cylinder group. For the same reason, blocks of data belonging to a single file (up to 48K) are also placed in the same cylinder group, preferably at rotationally optimal positions in the same cylinder. In order to maintain cylinder groups partially empty, block allocation of files exceeding 48K is redirected to another cylinder group.

The paper also presents solutions for functionally enhancing the user's interface to the file system. These enhancements are long file names, file locking, symbolic links, file renaming and a quota mechanism.

The improvements suggested by the paper assume that the performance bottleneck of the system is the disk seek latency for read operations. I think the paper does not give enough motivation for this assumption. This makes it difficult to analyze if the improvements are still valid when underlying technology changes.

--------------------------------------------------------------------------

The Design and Implementation of a Log-Structured File System

--------------------------------------------------------------

This paper describes a new technique for disk storage management called log-structured file system (LFS). This technique is based on the fact that there is a large amount of main memory available for caching disk blocks. Since many disk blocks can be found in the cache, the number of read operations to disk is reduced compared to the number of write operations. Thus, LFS tries to optimize write operations by writing all new information to disk in a log, which is a sequential structure. In this way, almost all seeks are eliminated. In order to avoid degradation in the performance of the reads, the log contains indexing information so that read operations can be performed in a reasonably efficient manner.

A difficulty with the log-based approach is that large extents of free space need to be available for writing new data. The paper presents a technique to overcome this difficulty by separating the data written to the log into segments. In this way, segments of free space can be generated by compressing segments that contain fragmented data. The system uses a cost-benefit policy for choosing which segments to compress. The benefit of cleaning a segment is the amount of free space that will be reclaimed and the amount of time this space is likely to stay free. The cost is the time for copying the segment into memory, and rewriting the live blocks into another segment. The segment with the best cost-benefit is chosen for compression.

Besides optimizing writes, a log-structure file system also permits fast crash-recovery. After a crash the system need only examine the most recent portion of the log to put itself in a consistent state. LFS uses checkpoints to define consistent states of the file system, and roll-forward to retrieve information written after the last checkpoint.

In my opinion, the idea of a log-structured file system is very well supported in the paper. It follows naturally from the observation that write operations are more frequent than read operations in modern systems. Furthermore, the paper justifies very well the design and implementation decisions taken. For example, it gives a very thoroughly and interesting analysis of the policy for choosing a segment to compress.