UCSD Main WebsiteUCSD Jacobs SchoolDepartment of Computer Science and Engineering
spacer gif
spacer gif
CSE 121
spacer gifspacer gif
spacer gif
spacer gifCourse Overview
spacer gifspacer gifStructure
spacer gifspacer gifGrading
spacer gifspacer gifCollaboration
spacer gifspacer gifUseful books
spacer gif
spacer gifspacer gifSchedule
spacer gif
spacer gifspacer gifReadings
spacer gif
spacer gifProjects (UCSD only)
spacer gif
spacer gifHomeworks
spacer gif
spacer gifWebBoard
spacer gif
spacer gif
spacer gif
Search
spacer gifspacer gifspacer gif
Advanced search  arrows gif

spacer gif
spacer gifspacer gif
spacer gif
spacer gif
spacer gif

spacer gif

spacer gif
spacer gifspacer gifspacer gif
spacer gif

Project 1: A Simple File System

Due: Tuesday, February 13th 11:59pm

Overview

So far in this class we have seen a number of methods for improving file system performance. We have talked about how file systems take a physical disk and create user-friendly abstractions such as directories and named files. Your job will now be to design and implement a basic application-level file system. You will be given a block interface to a raw "disk," and, using this interface, you will be required to design a simplified yet functional file system. For right now, we're just concerned with getting a file system operational. We're not going to worry about things like realistic data placement, caching, and the like. You are, however, going to be graded on how efficiently you use the disk. In other words, how large a file can be stored on your disk? What is the usable capacity of your file system if it's full of many 1-byte files? How many disk blocks do you need to read to access a small file? You should consider these factors when thinking about how to design your file system.

You will be using FUSE to implement this user level file system. FUSE is already installed on ieng6-203.ucsd.edu (you can get your account information for this machine at http://sdacs.ucsd.edu/~icc/) and you are welcome to use your own machine to do the project. (Of course, you need to install FUSE on your machine; it's available for Linux as well as FreeBSD. If your Linux Kernel is newer than 2.6.14, FUSE is already installed in your machine.) These C files will provide the interfaces to the functions you will be writing. The header file, cse121fs.h, defines the disk interface and any other include information you would like to add. cse121fs.c contains a skeleton implementation of the API; your job is to flesh out the implementation according to the specification below. The FUSE installation(/software/nonrdist/fuse) has an example directory that contains sample file systems; check it out for sample implementations. The FUSE source has some more sample file systems.

Here is a sample session using your file system:
after compiling your program into cse121fs then create a directory
~/project1$ mkdir mount_dir

Change the permisssions to allow FUSE to access the mount point
~/project1$ chmod 755 mount_dir

The following command will mount your file system on directory mount_dir
~/project1$ ./cse121fs mount_dir

now you can use your file system as a normal file system (so ls, cd, cat commands work)

The following command un mounts the file system
~/project1$ fusermount -u mount_dir
~/project1$
    

Simplifications and Assumptions

In order to keep this project manageable, we're going to make several simplifying assumptions. First, we are not mounting a physical disk; instead, we'll be using a fixed-size file on the normal file system. You will "mount" your filesystem by reading the disk image file provided. This disk image is simply a flat file which will eventually contain all the metadata blocks and data blocks in your file system. As a result, there is no need to worry about the physical characteristics of a disk (seek time, latency, disk layout) that some of the earlier papers we read addressed. Instead, assume disk blocks are arranged in a linear order---i.e., block 3 is next to block 4. When testing your project we will, however, use the number of disk accesses and the distance between subsequent block numbers as a rough metric of your file system's performance.

For the purposes of this project, you do not need to worry about the buffer cache; the OS will maintain the buffer cache for you and you don't need to do anything extra in your file system. You can assume all disk image reads and writes are synchronous (the OS actually caches data blocks and delays writes to disk, but for the purposes of this project let's ignore this detail). Also, all directory paths will be absolute and there is no notion of a working directory. We will not concern ourselves with things like file ownership, protection bits, time stamps, and the like. You do, however, need to maintain file size information somewhere. Finally, the block size is fixed to be 512 bytes and filename/directory names are fixed, 28-byte character strings.

Interface to Raw Disk

You will be provided with the following interface to the raw mounted disk from which you will construct and access your file system.

int dread(int fd, int blocknum, char *buf)
Given a blocknum, dread() will read that block into the specified buffer 'buf'. More specifically, dread will read BLOCKSIZE (512) bytes of data from the address logical block number blocknum. fd is the file descriptor corresponding to the currently mounted disk image (you will only need to manage one disk image at a time). Thus, reads from the disk image only happen at the granularity of a block, simulating a real disk. Also, dread() is synchronous and will wait until the entire block is read and copied into buf before returning.

int dwrite(int fd, int blocknum, const char *buf)
Given a block number, dwrite will write a block from buf to the disk image at location blocknum. dwrite() is guaranteed to be atomic and synchronous, so there is no need to worry about cases where blocks themselves are partially written. When implementing, be sure to keep in mind that at any time the file system might crash so be sure that at all times you are in an recoverable state. However you are guaranteed that crashes can only occur before or after a dwrite() call and not during.

Both functions will return the number of bytes written (BLOCKSIZE) on success and -1 on failure. Your code does not need to handle failure; it may simply exit with an error message (e.g., "your disk image crashed").

Formatting the Filesystem

Before mounting a file system for the first time, you need to initialize the raw disk (in our case disk file). You will write another program that writes the super block information onto the disk file before using this file as a raw disk (essentially you are "formatting" the disk partition to be used as a file system). What you write into the super block depends on how you design your file system, generally you would like to write some information about file system such as where the location of root directory or other meta data.

Filesystem API

The following is the API you are required to implement using the interface described above. When designing your file system, keep in mind that "crashes" may occur on any I/O operation and, like the papers read in class, your filesystem must always be in a stable and consistent state. We will simulate crashes by modifying dread() and dwrite() to occasionally stop the execution of your program either before or after they complete. We will then ask your code to re-mount the disk image in the state it was left in at the time of the "crash."

The cse121fs.c file defines the following interface:

int mymount(const char *filename)
This function expects the filename of he disk file that you are using as a disk; you may hard code the filename (defined in cse121fs.h as DISKFILE). This function should mount your application-level filesystem stored in the file; once a mount occurs, that file will be used as the disk for all further file system commands you will implement (until an xmp_unmount occurs). Basically, you need to open the disk file and store the file discriptor in a global variable that you will pass to dread and dwrite later. Remember that this is an application-level file system. From the point of view of a user of the application, it will appear like a structured file system. However, in reality the disk image is merely one contiguous file on the real file system.

This is also when you will need to check the integrity of your file system. For example, if a crash occurred during a file create after an inode was allocated but before a directory entry is updated, such an error should be found and fixed in xmp_mount.

The return value will passed in the private_data field of fuse_context to all file operations and as a parameter to the xmp_unmount() method.

int myunmount()
Unmount is responsible for unmounting the file system. In addition to closing the disk image file descriptor, unmount will need to write out any necessary meta data that might be required the next time the file system is mounted. For instance, it might note that the filesystem was cleanly unmounted, speeding up the integrity check the next time the file system is mounted.

int xmp_getattr(const char *path, struct stat *stbuf)
Given an absolute path to a file/directory (i.e., "/foo/bar"---all paths will start with the root directory, "/"), you need to return the file attributes that is similar stat system call. For this project, you don't need to fill in any attributes except file size, file size in blocks, and block size of the file system. cse121fs.c has more infomration on this. Depending on your implementation of the file system, the specifics of the getattr might vary, but the basic steps your file system takes to get file attributes will be the following:

  • Resolve the directory path -- this will probably involve reading each directory in the path to get the location of the file entry (or location of data block with the file's inode) of the specified file.
  • Fill the file size, file size in blocks, and block size information (note the block size is constant across all files).
You should return 0 on success or -1 on error. If the path doesn't exist, you should return -ENOENT (-2).

int xmp_create(const char *path, mode_t mode, dev_t rdev)
Given an absolute path to a file (for example "/a/b/myFile"), xmp_create will create a new file named "myFile" in the "/a/b" directory. Ignore the mode parameter as we are not implementing the permissions. Also ignore the rdev parameter. The steps your implementation may take to create a new file will likely include the following:

  • Resolve the directory path
  • Allocate and initialize a new directory entry
  • Update the directory entry for 'path' by adding an entry for the newly created file.
You should return 0 on success or -1 on error (e.g., the path doesn't exist, or file already exists).

int xmp_read(const char *path, char *buf, size_t size, off_t offset, struct fuse_file_info *fi)
The function xmp_read provides the ability to read data from an absolute path 'path,' which should specify an existing file. It will attempt to read 'size' bytes at offset 'offset' from the specified file on your filesystem into the memory address 'buf'. The return value is the amount of bytes actually read; if the file is smaller than size, myread will simply return the most amount of bytes it could read. On error, myread will return -1. The actual implementation of myread is dependent on how files are allocated. For time being ignore the fuse_file_info parameter. If you would like to use it, please see the open function in fuse.h file for more details.

int xmp_write(const char *path, const char *buf, size_t size, off_t offset, struct fuse_file_info *fi)
The function xmp_write will attempt to write 'size' bytes from memory address 'buf' into a file specified by an absolute 'path' (i.e., "/foo/bar"---all paths will start with the root directory, "/") at offset 'offset'. If the offset is greater than the file length, the file should be extended (with zeros) to be 'offset' long before appending buf to the end. On error (e.g., the path does not exist) xmp_write will return -1, otherwise xmp_write should return the number of bytes written. Note size is not necessarily an integral number of blocks. Similar to xmp_read, the actual implementation of xmp_write will depend on how you decide to allocate file space. For time being ignore the fuse_file_info parameter. If you would like to use please see the open function in fuse.h file for more detials.

int xmp_delete(const char *path)
This function deletes the last component of the path (e.g., "/a/b/c" you need to remove the file 'c' from the directory "/a/b"). Depending on your implementation of the file system, the specifics of the delete might vary, but the basic steps your file system takes to delete a file will be the following:

  • Locate the directory entry for the file by resolving the path - this will probably involve reading each directory block in the path to get the location of the first data block (or location of data block with the file's inode) of the specified file.
  • Update the directory entry for the deleted file
  • Free any data blocks used by the file
Again, these steps are very general and the actual logistics of how to locate data blocks of a file, how to free data blocks and how to update directory entries are dependent on your implementation of directories and file allocation. On error (e.g., the path does not exist) xmp_delete will return -1, otherwise xmp_delete should return 0.

int xmp_mkdir(const char *path, mode_t mode) Given an absolute path to a directory '.../dirname' (which may or may not end in '/'), xmp_mkdir will create a new directory named dirname in that directory. Ignore the mode parameter as we are not implementing the permissions. The steps your implementation will do to make a new directory will be the following(similar to the xmp_create):

  • Resolve the directory path
  • Allocate and initialize a new directory block
  • Update the directory block for 'path' by adding an entry for the newly created directory.
  • You also need to create appropriate entries for "." and ".." in the new directory

You should return 0 on success or -1 on error (e.g., the path doesn't exist, or dirname already does).

int xmp_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi)
Given an absolute path to a directory, xmp_readdir will return all the files and directories in that directory. The steps you will take will be the following:

  • Resolve the directory entry from its absolute path to get the address of the directory block that corresponds to "path"
  • For each valid entry in the directory block, copy the file name the array (buf) using the function filler. The filler function is already implemeted in FUSE and its sample usage is shown in fusexmp.c file and you should pass the zero to the offest paramter of filler function. Further details please see the fuse.h file.

You should return 0 on success or -1 on error (e.g., the directory does not exist).

int xmp_rename(const char *from, const char *to)
The function rename will rename a file or directory named by the string 'oldpath' and rename it to the file name specified by 'newpath'.

As usual, return 0 on success, -1 on failure.

Some Example Approaches

In this project you are free in designing the details of the implementation of your file system. However, it is highly recommended that you model your file system after an existing implementation, as this way you can be sure your design is safe and you will also have a means of reference during implementation. The following are two methods of file allocation that are recommended. The first is the File Allocation Table (FAT), and the second is the Unix File System inode.

FAT is a relatively straightforward data structure used to maintain information about file allocation and was used by early versions of MS-DOS and OS/2. It involves having a fixed portion of disk (usually the beginning of disk) used as a table to maintain information about all the blocks in the system. The length of the table is the number of all the blocks in the file system. The directory entry for a given file points to the first block of the file. The corresponding entry in the FAT points to the next entry in the file and so on until a special end of file entry is reached. For example, assume a file has blocks allocated 3, 6 and 9. The directory entry would point to block 3. The 3rd entry in the FAT would contain 6 and the 6th entry would contain 9. The 9th entry would contain a special end of file character.

Another possible approach to maintaining file allocation information is to use the Unix inode format. You should be familiar with this structure from CSE120. Basically, the directory entry contains an inode number, and every inode is at a fixed location on disk. An inode then contains information about all the data blocks via direct and indirect pointers. In your filesystem, both inode number and block number can be represented as a 32-bit integer.

Another detail that must be considered in implementing your file system is the directory structure. The simplest method of implementing a directory structure would be a simple linked list. Each directory block will contain a number of directory entries - each entry will consist of a file name and inode number pair. The last entry in the directory block will contain a pointer to the address of the next directory block for that directory. For example, in our implementation we have a fixed block size of 512, a fixed filename size of 28 bytes, and 4 bytes can be used to store the inode number. Therefore, there can be 16 directory entries for each block. The first 15 will contain pairs, and the last will contain a pointer to the next directory block.

Turn In

You will turn in your projects using the turnin command. In order to do this you must first tar your project files and then submit the tar file.

Example:
% tar cvf project1.tar [files]
% turnin -c cs121w -p project1 project1.tar

spacer gif
spacer gif
spacer gif
spacer gif
9500 Gilman Drive, La Jolla, CA 92093-0114
spacer gif
About CSE | CSE People | Faculty & Research | Graduate Education | Undergraduate Education
Department Administration | Contact CSE | Help | Search | Site map | Home
snoeren@cs.ucsd.edu
Official web page of the University of California, San Diego
Copyright © 2002 Regents of the University of California. All rights reserved.
spacer gif