Lab 4: A Simple Distibuted File System
Distibuted network file systems are commonplace today; we'll be
reading about three such systems, Harp [LGG+91], Frangipani [TML97], and Coda [KS92], in class, but there are many
others, like Coda, Zebra, Locus, Ficus, and Deceit. Unlike
traditional network file systems like NFS [SGK+95], in which all clients connect
to the same server, a distributed file system spreads a single disk
image across multiple servers, allowing clients to connect to any
server. As you can imagine, careful placement of the disk image data
across the server group may lead to increased performance,
availability, resilience to failures, or some combination of all three.
Unfortunately, this comes at a cost---consistency must be maintained
across the servers so that all clients see a similar disk image.
When you think about it, this synchronization is not unlike that
required for our chat server in the previous lab. In fact, if we
chose to replicate the disk image at each server, the requirements are
almost exactly the same: updates from any client must be broadcast to
every server, and all updates must be ordered in the same fashion at
each server. We'll stick to this simple model for now; you might find
it interesting to consider alternate data placement algorithms as part
of your term project.
Implementing a file server is tricky business. In order to allow you
to focus on the interesting distributed systems issues, and not UNIX
file system peculiarities, you'll be provided with a working
user-level server based on
the FUSE user-level file
system. This will allow you to mount your file system on any
machine with FUSE support, including all of the class machines and
most modern versions of Linux. Servers using FUSE need only implement
a set of functions similar to the NFSv3 RPCs; everything else is taken
care of for you. You won't need to know much about FUSE in order to
complete this assignment, but those of you interested in reading more
about it should check out the
SourceForge Web site for more information.
Your server will read in a configuration file which specifies the set
of peer servers. It will also be given a directory which it can use
to store the files it's sharing---you're welcome to use some in-memory
data structure if you'd like, but I think you'll find it easiest to
simply keep a mirror of your distributed file system on the local
file system. You need to make sure this directory has identical
content at each server (the simplest way to do that initially is to
make sure it's empty at startup).
We provide a sample server that will export the contents of the
scratch directory, but does not communicate with any other servers.
Hence, the versions of the file system exported at each server will
diverge as clients make modifications. Your task is to ensure that
the file systems stay consistent by incorporating any client's changes
at all servers by implementing a communication protocol between the
servers. Basically, you'll need to modify various NFS RPC functions
to broadcast the requests to all servers before completing.
Hopefully, you'll find the protocol you implemented for Lab 3 is at
least a partial solution.
You are responsible for the following requirements:
You are allowed to assume the following (non-realistic) things:
- You must implement strong consistency for all updates at the same
server. Namely, writes at any client must be observed by a subsequent
reads at all other clients of that server. Note this only refers to
requests actually dispatched to the server; client writes are cached
locally and only written-through periodically or at a close. The
server can obviously only affect the consistency of operations that
are not served entirely by the client's local cache.
- Clients must see updates at other servers if the updates occurred
more than two seconds in the past.
- The file system should continue to be available despite the failure
of one or more servers. Any clients connected to a dead server may
not be able to access the file system, but clients connected to live
servers must continue to be able to use the file system and see any
changes from other live clients. Open files (and those closed for
less than two seconds) at a failed server may become
inconsistent across servers.
- Your server must work both with servers running on the same
machine (with different scratch directories) and different
Obviously, the last one is not likely to be true in any real network,
including the course virtual cluster. However, the chances of a packet
drop on the virtual network are extremely small. The behavior of your
servers in the face of packet loss is undefined.
- Every server will be provided with the location of all other
servers at startup. No servers other than those specified on the
command line will join the network.
- All servers will be on-line before any client requests arrive.
- A server may die, but will never return during the lifetime of the
system. In other words, you can assume fail-stop behavior.
- The maximum propagation delay between any two servers is 100 ms.
- The delay between any client and its server is negligible.
- The network never drops any packets.
We have provided an initial skeleton directory to get you started. It
is available as /cse223b/labs/lab4.tgz on
the course machines. You should copy this file to your working
directory. Unlike the previous labs, this code will likely only build
and run on the course cluster (unless, of course, you happen to have
access to another machine with FUSE installed). The following
sequence of commands should extract the files and build the initial
(non-distributed) executable, distfs:
% tar xzf lab4.tgz
% cd lab4
There are several files in the tarball. The ones you'll be interested
in are client.C and peer.C. client.C implements each of the NFS RPCs
and peer.C implements the UDP receive callback (which you'll need to
extend to handle your peer communication protocol). You can pass
several command line arguments to distfs. The three you'll be
interested in are:
You should edit the configuration file as appropriate. The first
thing to do would be to change the export line to point to a directory
you've created. Second, you need to create a mount point for your
file system. In the example below, I use /tmp/mnt. You can then
start up distfs with the following command line:
% mkdir /tmp/mnt
% ./distfs -f ./sample_config /tmp/mnt
distfs: peer: trowel.ucsd.edu:7777
distfs: version 0.7.2, pid 69766
distfs: Listening for peers on port: 7777
distfs: Now exporting directory: /tmp/export-snoeren
If everything is working properly, you should be able to access the
contents of your file system by typing the following:
% ls /tmp/mnt
This should list the contents of your filesystem (the directory you
specified in the distfs configuration file). You can use this
directory just like any other filesystem. Note that our sample code
does not communicate any changes you make to peer servers.
All code for this assignment must be written individually. You
are not allowed to look at anyone else's solutions or solutions to
similar assignments you may find for courses at UCSD or other
institutions. You may discuss the assignment with fellow students,
but all code you submit must be either yours alone or code that was
provided to you as part of the assignment.
The turnin procedure is the same as for the previous labs. When you're
ready to submit your code, you can execute the following command:
% gmake turnin
which will create a tarball and copy it to the turnin directory.