CSE 120 Discussion Notes: Week 8

Review: Disks and Filesystems

At the system-call level, most systems today implement something similar to the semantics of POSIX (Unix):

Other abstractions, such as storing structures or even a full database in a file are implemented by libraries and applications on top of the operating system, not by the operating system itself.

We'd like our files to persist after an operating system reboot, so we put them somewhere safe: most commonly today, we use a hard drive. While most of a computer is purely electronic, a hard drive is also mechanical; this makes it orders of magnitude slower in performing some operations. The important characteristics of a hard drive for our purposes are:

Modern drives may have a complex internal layout, but to the operating system look like a large array of sectors which can be read and written.

Most Unix filesystems, and many non-Unix filesystems too, have a fairly simple structure, made up of several key elements:

There may be other secondary data structures, such as allocation bitmaps that indicate which blocks on disk are free and which are in use. The actual layout of these blocks may vary; some filesystems may allocate fixed areas of disk for inodes, and others may allocate inodes dynamically. The filesystem may use cylinder groups or some similar mechanism. The superblock may be replicated for redundancy. But at a conceptual level, they all work more or less the same.

Question: Given these data structures, what disk accesses would be required when executing the following code, assuming no data is in cache?

char buf[32];
int fd = open("/usr/include/stdio.h", O_RDONLY);
read(fd, buf, 32);
close(fd);

Filesystem Consistency

Because seeks on disks are slow, we often want to try to batch up disk operations as much as possible, so we can reorder or combine operations for better efficiency. When performing disk writes, one good way to do this is to use a write-back cache (as opposed to a write-through cache). As we modify files, we don't write modified blocks to the disk immediately. Instead, we keep the data in memory, in a cache, and write the data out to disk somewhat lazily (but usually within some window of time, such as 30 seconds, to ensure that data does stay in memory for too long).

The order in which we write out blocks to disk may be quite different than the order in which the changes were made. This opens up the possibility for all sorts of problems if the computer crashes while some data has not yet been written. When the computer reboots and reads the filesystem, it will find that the filesystem is in an inconsistent state.

Question: What types of filesystem problems can occur if the system crashes before all filesystem modifications have been written? We usually assume that the disk is capable of ensuring that complete blocks get written, so we don't have a problem with part of a block being written.

Maintaining Filesystem Consistency

We'll divide bytes on disk into two categories: data and metadata. Metadata is data about data: inodes, directories, indirect blocks—everything that isn't user data but is used to locate user data on disk. After a sytem crash, there may have been writes to both data and metadata that did not finish. However, from a filesystem point of view, we're generally most concerned with inconsistencies in filesystem metadata. (Why?)

Often after a crash, a filesystem checker (fsck on Unix) will run, analyzing disk contents and trying to recover from any partial writes of data, at least to the point of making the filesystem consistent again.

There are various strategies we might use in our filesystem to try to keep data on disk consistent or make recovery easier.

Pray. ext2 on Linux takes this approach by default. Don't do anything special; hope that data is not too corrupted after a crash, and use a filesystem checker to try to fix things up.

Synchronous metadata writes. Used by FFS, or ext2 with the appropriate flags. Anytime we write metadata to disk, make the write synchronous—that is, don't put it in the cache to be written later, and instead write it out to disk right then, and wait for the write to finish. This actually can't eliminate all corruption, but can make the window during which a crash causes problems short. It also kills performance.

Soft updates. Found in FreeBSD. This is a clever approach to the filesystem consistency problem, and works by carefully ordering when writes are sent to disk, so that the data structures on disk are always consistent. In doing so, it can avoid the performance penalty of synchronous metadata writes. (It can actually still leave a few problems—it might cause disk space to be lost after a crash—but this is easy to fix up any time after a reboot.)

Journaling filesystems. Include many modern filesystems, such as ext3, Reiserfs, and NTFS. All operations are written to a journal (or log) on disk before they are performed. The operations are not actually started until the data is safely in the log. If a crash occurs, the log contains enough information to finish whatever operations were in progress, so that the filesystem is made consistent again. However, now all operations require writes to two places (the log and the actual data structures).

Questions

  1. DOS used a different filesystem called FAT. The beginning part of disk contained a data structure called the File Allocation Table (hence the name). It is an array with as many entry as there are data blocks; each entry contains a single integer which is the index of the following data block in a file. File blocks are thus stored as a linked list, with the next pointer kept separate from the data itself. What are the advantages and disadvantages of this system? What types of filesystem corruption do you think might occur?
  2. Many Unix systems allow multiple hard links to files, but do not allow hard links to be created to directories. Why do you think this is?
  3. What types of modifications or additions to the POSIX filesystem interface do you think might be useful for an operating system to support? (There is no single answer here.)
  4. What is the relation between journaling filesystems and LFS (seen in lecture)?
  5. What are the relative merits of soft updates and journalling? Why might one be used over the other?