UCSD Main WebsiteUCSD Jacobs SchoolDepartment of Computer Science and Engineering
spacer gif
spacer gif
CSE 121
spacer gifspacer gif
spacer gif
spacer gifCourse Overview
spacer gifspacer gifStructure
spacer gifspacer gifGrading
spacer gifspacer gifCollaboration
spacer gifspacer gifUseful books
spacer gif
spacer gifspacer gifSchedule
spacer gif
spacer gifspacer gifReadings
spacer gif
spacer gifProjects (UCSD only)
spacer gif
spacer gifHomeworks
spacer gif
spacer gifWebBoard
spacer gif
spacer gif
spacer gif
Search
spacer gifspacer gifspacer gif
Advanced search  arrows gif

spacer gif
spacer gifspacer gif
spacer gif
spacer gif
spacer gif

spacer gif

spacer gif
spacer gifspacer gifspacer gif
spacer gif

Project 2: A Simple File System

Due: Thursday, February 24th 11:59pm

Overview

So far in this class we have seen a number of methods for improving file system performance. We have talked about how file systems take a physical disk and create user-friendly abstractions such as directories and named files. Your job will now be to design and implement a basic application-level file system. You will be given a block interface to a raw "disk," and, using this interface, you will be required to design a simplified yet functional file system. For right now, we're just concerned with getting a file system operational. We're not going to worry about things like realistic data placement, caching, and the like. You are, however, going to be graded on how efficiently you use the disk. In other words, how large a file can be stored on your disk? What is the usable capacity of your file system if it's full of many 1-byte files? How many disk blocks do you need to read to access a small file? You should consider these factors when thinking about how to design your file system.

These C files will provide the interfaces to the functions you will be writing. The header file, disk.h, defines the file system API you are to write. There are also two C files, disk.c and driver.c. Disk.c contains a a skeleton implementation of the API; your job is to flesh out the implementation according to the specification below. All of the file system operations possible in this project are going to be accessed via the API you implement in disk.c. Driver.c is an application provided to you that utilizes the API in disk.c. Driver contains code for a bare-bones command prompt that allows the user to run the following basic commands on your file system: ls, mkdir, mv, rm, read, and write.

Simplifications and Assumptions

In order to keep this project manageable, we're going to make several simplifying assumptions. First, we are not mounting a physical disk, instead we'll be using a fixed-size file on the normal file system. You will "mount" your filesystem by reading the disk image file provided. This disk image is simply a single file which will eventually contain all the metadata blocks and data blocks in your file system. As a result, there is no need to worry about the physical characteristics of a disk (seek time, latency, disk layout) that some of the earlier papers we read addressed. Instead, assume disk blocks are arranged in a linear order---i.e., block 3 is next to block 4. When testing your project we will, however, use the number of disk accesses and the distance between subsequent block numbers as a rough metric of your file system's performance.

For the purposes of this project, there is no file buffer cache. You can assume all disk image reads and writes are synchronous. Also, all directory paths will be absolute and there is no notion of a working directory. In your file system, there will be no special entries in directories for "." and "..". We will not concern ourselves with things like file ownership, protection bits, time stamps, and the like. Finally, the block size is fixed to be 512 bytes and filename/directory names are fixed to 28-byte character strings.

Interface to Raw Disk

You will be provided with the following interface to the raw mounted disk from which you will construct and access your file system.

int dread(int fd, int blocknum, char *buf)
Given a blocknum, dread() will read that block into the specified buffer 'buf'. More specifically, dread will read BLOCKSIZE data from the address logical block number blocknum. fd is the file descriptor corresponding to the currently mounted disk image (you will only need to manage one disk image at a time). Thus, reads from the disk image only happen at the granularity of a block, simulating a real disk. Also, dread() is synchronous and will wait until the entire block is read and copied into buf before returning.

int dwrite(int fd, int blocknum, const char *buf)
Given a block number, dwrite will write a block from buf to the disk image at location blocknum. dwrite() is guaranteed to be atomic and synchronous, so there is no need to worry about cases where blocks themselves are partially written. When implementing, be sure to keep in mind that at any time the file system might crash so be sure that at all times you are in an recoverable state. However you are guaranteed that crashes can only occur before or after a dwrite() call and not during.

Both functions will return a 0 on success and -1 on failure. Your code does not need to handle failure; it may simply exit with an error message (e.g., "your disk image crashed").

Filesystem API

The following is the API you are required to implement using the interface described above. When designing, keep in mind that "crashes" may occur anytime and, like the papers read in class, your filesystem must always be in a stable and consistent state. We will simulate crashes by modifying dread() and dwrite() to occasionally stopping the execution of the program either before or after they complete. We will then ask your code to re-mount the disk image in the state it was left in at the time of the "crash."

int myformat(const char *filename, int size)
Given a pathname and a size (expressed in terms of number of blocks), myformat will "format" the disk image associated with that file. This involves initializing or resetting and overwriting any necessary meta data information. It's probably a good idea to include some sort of super-block information so your mount function is able to identify your own disk images (as opposed to, say, some garbage file we might ask it to mount).

int mymount(const char *filename)
Given a filename of a (supposedly) previously formatted disk image, mymount will use that file as the disk image to mount the application-level filesystem. Once a mount occurs, that file will be used as the disk for all further file system commands you will implement (until an myunmount occurs). Remember that this is an application level file system. From the point of view of a user of the application, it will appear like a structured file system. However, in reality the disk image is merely one contiguous file in the operating system.

This is also when you will need to check the integrity of your file system. For example, if a crash occurred during a file create after an inode was allocated but before a directory entry is updated, such an error should be found and fixed in mymount.

int myunmount()
Unmount is responsible for unmounting the file system. In addition to closing the disk image file descriptor, unmount will need to write out any necessary meta data that might be required the next time the file system is mounted. For instance, it might note that the filesystem was cleanly unmounted, speeding up the integrity check the next time the file system is mounted.

int mywrite(const char *path, const void *buf, int size)
The function mywrite will attempt to write 'size' bytes from memory address 'buf' into a file specified by an absolute 'path' (i.e., "/foo/bar"---all paths will start with the root directory, "/"). If the file does not yet exist, mywrite should create it. On error (e.g., the path does not exist) mywrite will return -1, otherwise mywrite should return the number of bytes written. Note size is not necessarily an integral number of blocks. Similar to myread, the actual implementation of mywrite will depend on how you decide to allocate file space.

int myread(const char *path, void *buf, int *size)
The function myread provides the ability to read data from an absolute path 'path,' which should specify an existing file. It will attempt to read 'size' bytes from the specified file on your filesystem into the memory address 'buf'. The return value is the amount of bytes actually read; if the file is smaller than size, myread will simply return the most amount of bytes it could read. On error, myread will return -1. The actual implementation of myread is dependent on how files are allocated.

int mydelete(const char *path)
Given an absolute path to a file in your file system, mydelete will delete that file. Depending on your implementation of the file system, the specifics of the delete might vary, but the basic steps your file system takes to delete a file will be the following:

  • Locate the directory entry for the file by resolving the path - this will probably involve reading each directory block in the path to get the location of the first data block (or location of data block with the file's inode) of the specified file.
  • Update the directory entry for the deleted file
  • Free any data blocks used by the file
Again, these steps are very general and the actual logistics of how to locate data blocks of a file, how to free data blocks and how to update directory entries are dependent on your implementation of directories and file allocation.

int mymkdir(const char *path, const char *dirname)
Given an absolute path to a directory (which may or may not end in '/'), mymkdir will create a new directory named dirname in that directory. The steps your implementation will do to make a new directory will be the following:

  • Resolve the directory path
  • Allocate and initialize a new directory block
  • Update the directory block for 'path' by adding an entry for the newly created directory.

You should return 0 on success or -1 on error (e.g., the path doesn't exist, or dirname already does).

char ** mygetdir(const char *path)
Given an absolute path to a directory, mygetdir will return all the files and directories in that directory. The steps you will take will be the following:

  • Resolve the directory entry from its absolute path to get the address of the directory block that corresponds to "path"
  • For each valid entry in the directory block, copy the file name into an array
  • Return the array of file name strings

You should return NULL on failure (e.g., the directory does not exist).

int myrename(const char *oldpath, const char *newpath)
The function rename will rename a file or directory named by the string 'oldpath' and rename it to the file name specified by 'newpath'.

As usual, return 0 on success, -1 on failure.

Some Example Approaches

In this project you are free in designing the details of the implementation of your file system. However, it is highly recommended that you model your file system after an existing implementation, as this way you can be sure your design is safe and you will also have a means of reference during implementation. The following are two methods of file allocation that are recommended. The first is the File Allocation Table (FAT), and the second is the Unix File System inode.

FAT is a relatively straightforward data structure used to maintain information about file allocation and was used by early versions of MS-DOS and OS/2. It involves having a fixed portion of disk (usually the beginning of disk) used as a table to maintain information about all the blocks in the system. The length of the table is the number of all the blocks in the file system. The directory entry for a given file points to the first block of the file. The corresponding entry in the FAT points to the next entry in the file and so on until a special end of file entry is reached. For example, assume a file has blocks allocated 3, 6 and 9. The directory entry would point to block 3. The 3rd entry in the FAT would contain 6 and the 6th entry would contain 9. The 9th entry would contain a special end of file character.

Another possible approach to maintaining file allocation information is to use the Unix inode format. You should be familiar with this structure from CSE120. Basically, the directory entry contains an inode number, and every inode is at a fixed location on disk. An inode then contains information about all the data blocks via direct and indirect pointers. In your filesystem, both inode number and block number can be represented as a 32-bit integer.

Another detail that must be considered in implementing your file system is the directory structure. The simplest method of implementing a directory structure would be a simple linked list. Each directory block will contain a number of directory entries - each entry will consist of a file name and inode number pair. The last entry in the directory block will contain a pointer to the address of the next directory block for that directory. For example, in our implementation we have a fixed block size of 512, a fixed filename size of 28 bytes, and 4 bytes can be used to store the inode number. Therefore, there can be 16 directory entries for each block. The first 15 will contain pairs, and the last will contain a pointer to the next directory block.

Turn In

You will turn in your projects using the turnin command. In order to do this you must first tar your project files and then submit the tar file.

Example:
% tar cvf project2.tar [files]
% turnin -c cs121w -p project2 project1.tar

spacer gif
spacer gif
spacer gif
spacer gif
9500 Gilman Drive, La Jolla, CA 92093-0114
spacer gif
About CSE | CSE People | Faculty & Research | Graduate Education | Undergraduate Education
Department Administration | Contact CSE | Help | Search | Site map | Home
snoeren@cs.ucsd.edu
Official web page of the University of California, San Diego
Copyright © 2002 Regents of the University of California. All rights reserved.
spacer gif