Paper Evaluations: 06-01-2000

Wed, 31 May 2000 22:34:16 -0700 (PDT)

Title: "A Fast File System for Unix"

This paper describes a new implementation for the Unix file system.
Some modifications were introduced to the original implementation to
improve system performance by increasing locality of reference. In
this way, the average time for read accesses decreases.

In this new implementation, the block size for a file system is
defined at the time the file system is created, but it restricts the
minimum to 4096 bytes. Thus, file systems with different block sizes
can be accessible at the same time. The advantage of this approach is
the increased throughput obtained when larger blocks are
chosen. Larger blocks, however, suffer from internal fragmentation
when small files are used. In order to decrease the disk space wasted
in small files, disk blocks can be fragmented and the fragments used
to store file data. Hence, disk transactions still read on a block
basis, but internal fragmentation decreases.

Another modification in the file system partition is the division in
cylinder groups. Bookkeeping information is stored for each group,
including a redundant copy of the superblock, inodes for the files in
the cylinder group and a bitmap of available blocks. Since inodes are
closer to file data blocks, the seek time for file data blocks
decreases. Also, bitmaps of available blocks facilitates the search
for rotationally close blocks.

Each file system is parameterized so that it can adapt to the
characteristics of the disk on which it is placed. Hence, it tries to
allocate blocks for a file in the same cylinder as the previous blocks
of this file and the position on the cylinder depends on the disk.
Moreover, the amount of time necessary for serving processor requests
is also considered. Block allocation depends on the layout
policies. The file system layout policies are divided in global policy
routines and local allocation routines. Global policies tries to
cluster related information to increase locality. Thus, it tries to
place in a cylinder group the inodes of files in the same directory,
and blocks of the same file. The local allocation routines serve
global policies allocating requested blocks. If a requested block is
not available, then the local allocator determines another one that is
rotationally closest to the requested block.

The file system performance was evaluated by measuring the throughput
for read and write accesses and comparing with a previous Unix
implementation. As expected, disk bandwidth and CPU utilization
improved for the new version mainly because of the larger block size
used. An interesting observation is that the performance of the new
file system is current limited by memory copies required to move data
from disk buffers in the system address space to the user address

Besides the modification described, some other features were added to
the new implementation, as long file names, file locking, symbolic
links, file renaming and disk quota.

In my opinion, this is an important research work, since it presents a
improvement for the Unix file system, which is an operating system
used both in academic and corporation environments. The paper,
however, is not well organized. It is not clear by the introduction
how the Unix file system is improved. A consequence is that it does
not motivate the reader. Moreover, the performance evaluation is not
complete. For instance, application profiles could be used to support
authors' claims.


Title: "The Design and Implementation of a Log-Structured
File System"

This paper describes a new disk storage management technique called
log-structured file system and a prototype implementation in the
Sprite operating system. Log-structured file systems are based on the
assumption that files are cached in main memory. Thus, as the amount
of main memory increases, read requests are more effectively satisfied
and disk access is used mostly for write accesses.

In a log-structured files system, all new information is written
permanently to disk in a sequential structure called the log. Because
the log is sequential, it improves write performance, decreasing the
number of seeks, and makes crash recovery faster. Write access
performance is improved by buffering a sequence of file system changes
in the file cache and then writing all the changes to disk
sequentially in a single disk write operation. In this way, for
workloads containing many small files, a log-structured file system
converts small synchronous random writes into asynchronous sequential
transfers, using disk bandwidth more efficiently. Moreover,
information modified or created at the same time are close together on
the disk promoting temporal locality.

File information is stored in inodes, as in the Unix file system.
Different from Unix, inodes are not placed in fixed positions, thereby
using an inode map to keep track of inode locations on disk. Hence,
random accesses do not require sequential search on the log

Another important issue is free space management. As files are deleted
and overwritten, free non-contiguous spaces are created. In order to
manage the free space efficiently, the disk is divided in large
fixed-size segments. The segment cleaning process consists of copying
the blocks of some segments into memory and then copying live data
back in a compacted form to the beginning of clean segments. Moreover,
in the Sprite implementation, this cleaning process detects blocks
that are not being used through the information provided by segment
summary blocks, which are created and updated during write accesses to
the segment. To determine which segments should be cleaned, an
algorithm is proposed that balances the cost and the benefit of
running the cleaner on a segment. Thus, the algorithm chooses the
segments with higher rate of benefit to cost and, in this way, it
avoids the increased write cost caused by locality, and hence provides
bimodal distribution of segments.

In log-structured file systems, crash recovery is also facilitated.
Consistent states of the log are defined by checkpoints. Checkpoint
information is written to a fixed region and is used in the
roll-forward phase to restore the consistency of the log. Instead of
simply dropping the data written since the last checkpoint,
roll-forward is used in order to restore as much data as

Most of the features described were implemented in Sprite. Some
benchmarks were executed in order to evaluate file system performance
and compare with SunOS FFS. Summarizing the results, they show that
Sprite LFS promotes a better CPU utilization, outperforms SunOS or
performs similarly for most of the profiles for file system accesses,
and provides a low cleaning overhead. In addition, recovery time can
be as low as 1 second, depending on the choice for the interval
between consecutive checkpoints.

In my opinion, this paper is very well written. It presents clearly
the goals of the research work, motivating with problems in current
operating systems and computer system technologies. The performance
evaluation is very interesting. Some simulations results are compared
with data obtained in the Sprite production system. This comparison is
very important, because it validates the proposal.

- Flavio Junqueira.

Flavio Paiva Junqueira, PhD Student
Department of Computer Science and Engineering
University of California, San Diego

Office location: AP&M 4349 e-mail: