Papers for 6-1-00

Mark Andrew SMITH (masmith@cs.ucsd.edu)
Thu, 1 Jun 2000 01:48:00 -0700 (PDT)

The Design and Implementation of a Log-Structured File System

This paper proposes a log-structured file system in response to larger
caches being able to increase read speeds and cache writes for large write
bursts which more effectively use disk bandwidth. The semantics of the
filesystem are the same as that of Unix. Unlike in Unix, however, the
inodes are not in fixed locations, instead inodes are written to the
log as well. An inode map which fits in main memory gives quick access to
the locations of th inodes.
The log structure is basically a large piece of contiguous disk space.
Writes to the log cause previous versions of file blocks to become
useless. These inactive blocks should be "cleaned." A cost-benefit formula
is used to determine which segments should be cleaned up.
Crash recovery is cone with checkpointing. The standard roll-forward
method is employed to bring the system up to date from its last consistent
checkpoint.
The log-structure file system has very good performance, with an increase
in disk bakcwidth utilization from 5-10% in Unix to 70% with cached
writes. The only thing its worse at is random writes followed by
sequential reads, since storage is not sequential in this case.

A Fast File System for Unix

It was recognized that Unix had an unacceptably slow filesystem with
unoptimized storage issues, such as the 512 byte blocks. This paper is the
result of an attempt to fix some of the problems associated with the old
system. First, varying block sizes were defined. Smaller block sizes save
on internal fragmentation, but are very poor for writing files. Seeks may
need to occur between writing each small block. On the other hand, large
blocks allow very large throughput, but it is possible that with a large
number of small files, the internal fragmentation may become too high
(45.6% for example is unacceptable).
Further, disk specific optimizations are made that take advantage of
information such as the number of blocks per track and the RPM of the
drive. Layout is improved to use more locality of reference in order o
minimize seek latency. Features such as long file names and improved
file locking (getting rid of lock files) are also useful new features.
Quotas and improved rename are also discussed.

-Mark Smith