Project 2: A Simple File System
Due: Thursday, February 24th 11:59pm
Overview
So far in this class we have seen a number of methods for improving
file system performance. We have talked about how file systems take a
physical disk and create user-friendly abstractions such as
directories and named files. Your job will now be to design and
implement a basic application-level file system. You will be given a
block interface to a raw "disk," and, using this interface, you will
be required to design a simplified yet functional file system. For
right now, we're just concerned with getting a file system
operational. We're not going to worry about things like realistic
data placement, caching, and the like. You are, however, going to be
graded on how efficiently you use the disk. In other words, how large
a file can be stored on your disk? What is the usable capacity of
your file system if it's full of many 1-byte files? How many disk
blocks do you need to read to access a small file? You should
consider these factors when thinking about how to design your file
system.
These C files will provide the interfaces to the
functions you will be writing. The header file, disk.h, defines the
file system API you are to write. There are also two C files, disk.c
and driver.c. Disk.c contains a a skeleton implementation of the API;
your job is to flesh out the implementation according to the
specification below. All of the file system operations possible in
this project are going to be accessed via the API you implement in
disk.c. Driver.c is an application provided to you that utilizes the
API in disk.c. Driver contains code for a bare-bones command prompt
that allows the user to run the following basic commands on your file
system: ls, mkdir, mv, rm, read, and write.
Simplifications and Assumptions
In order to keep this project manageable, we're going to make several
simplifying assumptions. First, we are not mounting a physical disk,
instead we'll be using a fixed-size file on the normal file system.
You will "mount" your filesystem by reading the disk image file
provided. This disk image is simply a single file which will
eventually contain all the metadata blocks and data blocks in your
file system. As a result, there is no need to worry about the
physical characteristics of a disk (seek time, latency, disk layout)
that some of the earlier papers we read addressed. Instead, assume
disk blocks are arranged in a linear order---i.e., block 3 is next to
block 4. When testing your project we will, however, use the number
of disk accesses and the distance between subsequent block numbers as
a rough metric of your file system's performance.
For the purposes of this project, there is no file buffer cache. You
can assume all disk image reads and writes are synchronous. Also, all
directory paths will be absolute and there is no notion of a working
directory. In your file system, there will be no special entries in
directories for "." and "..". We will not concern ourselves with
things like file ownership, protection bits, time stamps, and the like.
Finally, the block size is fixed to be 512 bytes and
filename/directory names are fixed to 28-byte character strings.
Interface to Raw Disk
You will be provided with the following interface to the raw mounted
disk from which you will construct and access your file system.
int dread(int fd, int blocknum, char *buf)
Given a
blocknum, dread() will read that block into the specified buffer
'buf'. More specifically, dread will read BLOCKSIZE data from the
address logical block number blocknum. fd is the file descriptor
corresponding to the currently mounted disk image (you will only need
to manage one disk image at a time). Thus, reads from the disk image
only happen at the granularity of a block, simulating a real disk.
Also, dread() is synchronous and will wait until the entire block is
read and copied into buf before returning.
int dwrite(int fd, int blocknum, const char *buf)
Given a block number, dwrite will write a block from buf to the disk
image at location blocknum. dwrite() is guaranteed to be
atomic and synchronous, so there is no need to worry about cases where
blocks themselves are partially written. When implementing, be sure
to keep in mind that at any time the file system might crash so be
sure that at all times you are in an recoverable state. However you
are guaranteed that crashes can only occur before or after a dwrite()
call and not during.
Both functions will return a 0 on success and -1 on failure. Your
code does not need to handle failure; it may simply exit with an error
message (e.g., "your disk image crashed").
Filesystem API
The following is the API you are required to implement using the
interface described above. When designing, keep in mind that
"crashes" may occur anytime and, like the papers read in class, your
filesystem must always be in a stable and consistent state. We will
simulate crashes by modifying dread() and dwrite() to occasionally
stopping the execution of the program either before or after they
complete. We will then ask your code to re-mount the disk image in the
state it was left in at the time of the "crash."
int myformat(const char *filename, int size)
Given a pathname and a size (expressed in terms of number of blocks),
myformat will "format" the disk image associated with that file. This
involves initializing or resetting and overwriting any necessary meta
data information. It's probably a good idea to include some sort of
super-block information so your mount function is able to identify
your own disk images (as opposed to, say, some garbage file we might
ask it to mount).
int mymount(const char *filename)
Given a filename of a (supposedly) previously formatted disk image,
mymount will use that file as the disk image to mount the
application-level filesystem. Once a mount occurs, that file will be
used as the disk for all further file system commands you will
implement (until an myunmount occurs). Remember that this is an
application level file system. From the point of view of a user of
the application, it will appear like a structured file system.
However, in reality the disk image is merely one contiguous file in
the operating system.
This is also when you will need to check the integrity of your file
system. For example, if a crash occurred during a file create after an
inode was allocated but before a directory entry is updated, such an
error should be found and fixed in mymount.
int myunmount()
Unmount is responsible for unmounting the file system. In addition to
closing the disk image file descriptor, unmount will need to write out
any necessary meta data that might be required the next time the file
system is mounted. For instance, it might note that the filesystem
was cleanly unmounted, speeding up the integrity check the next time
the file system is mounted.
int mywrite(const char *path, const void *buf, int size)
The function mywrite will attempt to write 'size' bytes from memory
address 'buf' into a file specified by an absolute 'path' (i.e.,
"/foo/bar"---all paths will start with the root directory, "/"). If
the file does not yet exist, mywrite should create it. On error
(e.g., the path does not exist) mywrite will return -1, otherwise
mywrite should return the number of bytes written. Note size is not
necessarily an integral number of blocks. Similar to myread, the
actual implementation of mywrite will depend on how you decide to
allocate file space.
int myread(const char *path, void *buf, int *size)
The function myread provides the ability to read data from an absolute
path 'path,' which should specify an existing file. It will attempt
to read 'size' bytes from the specified file on your filesystem into
the memory address 'buf'. The return value is the amount of bytes
actually read; if the file is smaller than size, myread will simply
return the most amount of bytes it could read. On error, myread will
return -1. The actual implementation of myread is dependent on how
files are allocated.
int mydelete(const char *path)
Given an absolute path to a file in your file system, mydelete will
delete that file. Depending on your implementation of the file
system, the specifics of the delete might vary, but the basic steps
your file system takes to delete a file will be the following:
- Locate the directory entry for the file by resolving the path - this will probably involve reading each directory block in the path to get the location of the first data block (or location of data block with the file's inode) of the specified file.
- Update the directory entry for the deleted file
- Free any data blocks used by the file
Again, these steps are very general and the actual logistics of how to locate data blocks of a file, how to free data blocks and how to update directory entries are dependent on your implementation of directories and file allocation.
int mymkdir(const char *path, const char *dirname)
Given an absolute path to a directory (which may or may not end in
'/'), mymkdir will create a new directory named dirname in that
directory. The steps your implementation will do to make a new
directory will be the following:
- Resolve the directory path
- Allocate and initialize a new directory block
- Update the
directory block for 'path' by adding an entry for the newly created
directory.
You should return 0 on success or -1 on error (e.g., the path doesn't
exist, or dirname already does).
char ** mygetdir(const char *path)
Given an absolute path to a directory, mygetdir will return all the files and directories in that directory. The steps you will take will be the following:
- Resolve the directory entry from its absolute path to get the address of the directory block that corresponds to "path"
- For each valid entry in the directory block, copy the file name into an array
- Return the array of file name strings
You should return NULL on failure (e.g., the directory does not exist).
int myrename(const char *oldpath, const char *newpath)
The function rename will rename a file or directory named by the string 'oldpath' and rename it to the file name specified by 'newpath'.
As usual, return 0 on success, -1 on failure.
Some Example Approaches
In this project you are free in designing the details of the implementation of your file system. However, it is highly recommended that you model your file system after an existing implementation, as this way you can be sure your design is safe and you will also have a means of reference during implementation. The following are two methods of file allocation that are recommended. The first is the File Allocation Table (FAT), and the second is the Unix File System inode.
FAT is a relatively straightforward data structure used to maintain information about file allocation and was used by early versions of MS-DOS and OS/2. It involves having a fixed portion of disk (usually the beginning of disk) used as a table to maintain information about all the blocks in the system. The length of the table is the number of all the blocks in the file system. The directory entry for a given file points to the first block of the file. The corresponding entry in the FAT points to the next entry in the file and so on until a special end of file entry is reached. For example, assume a file has blocks allocated 3, 6 and 9. The directory entry would point to block 3. The 3rd entry in the FAT would contain 6 and the 6th entry would contain 9. The 9th entry would contain a special end of file character.
Another possible approach to maintaining file allocation information is to use the Unix inode format. You should be familiar with this structure from CSE120. Basically, the directory entry contains an inode number, and every inode is at a fixed location on disk. An inode then contains information about all the data blocks via direct and indirect pointers. In your filesystem, both inode number and block number can be represented as a 32-bit integer.
Another detail that must be considered in implementing your file system is the directory structure. The simplest method of implementing a directory structure would be a simple linked list. Each directory block will contain a number of directory entries - each entry will consist of a file name and inode number pair. The last entry in the directory block will contain a pointer to the address of the next directory block for that directory. For example, in our implementation we have a fixed block size of 512, a fixed filename size of 28 bytes, and 4 bytes can be used to store the inode number. Therefore, there can be 16 directory entries for each block. The first 15 will contain pairs, and the last will contain a pointer to the next directory block.
Turn In
You will turn in your projects using the turnin command. In order to do this you must first tar your project files and then submit the tar file.
Example:
% tar cvf project2.tar [files]
% turnin -c cs121w -p project2 project1.tar