Lab 4: A Simple Distibuted File System
Due: 12:00am (midnight), Sunday, May 4th, 2003
Distibuted network file systems are commonplace today; we'll be
reading about three such systems, Harp [LGG+91], Frangipani [TML97], and Cedar [Hag87], in class, but there are many
others, like Coda, Zebra, Locus, Ficus, and Deceit. Unlike
traditional network file systems like NFS [SGK+95], in which all clients connect
to the same server, a distributed file system spreads a single disk
image across multiple servers, allowing clients to connect to any
server. As you can imagine, careful placement of the disk image data
across the server group may lead to increased performance,
availability, resilience to failures, or some combination of all three.
Unfortunately, this comes at a cost---consistency must be maintained
across the servers so that all clients see a similar disk image.
When you think about it, this synchronization is not unlike that
required for our chat server in the previous lab. In fact, if we
chose to replicate the disk image at each server, the requirements are
almost exactly the same: updates from any client must be broadcast to
every server, and all updates must be ordered in the same fashion at
each server. We'll stick to this simple model for now; you might find
it interesting to consider alternate data placement algorithms as part
of your term project.
Implementing a file server is tricky business. In order to allow you
to focus on the interesting distributed systems issues, and not UNIX
file system peculiarities, you'll be provided with a working user-level
server based on the Self-certifying File System (SFS). This will
allow you to mount your file system on any machine with an SFS client,
including all of the Active Web FreeBSD machines. Servers using the
SFS framework need only implement the NFSv3 RPCs; everything else is
taken care of for you. You won't need to know much about SFS or the
loopback NFS toolkit it's based on, but those of you interested in
reading more about it should check out the original SFS
paper and the user-level file system
Your server will read in a configuration file which specifies the set
of peer servers. It will also be given a directory which it can use
to store the files it's sharing---you're welcome to use some in-memory
data structure if you'd like, but I think you'll find it easiest to
simply keep a mirror of your distributed file system on the local
file system. You need to make sure this directory has identical
content at each server (the simplest way to do that initially is to
make sure it's empty at startup).
We provide a sample server that will export the contents of the
scratch directory, but does not communicate with any other servers.
Hence, the versions of the file system exported at each server will
diverge as clients make modifications. Your task is to ensure that
the file systems stay consistent by incorporating any client's changes
at all servers by implementing a communication protocol between the
servers. Basically, you'll need to modify various NFS RPC functions
to broadcast the requests to all servers before completing.
Hopefully, you'll find the protocol you implemented for Lab 3 is at
least a partial solution.
You are responsible for the following requirements:
You are allowed to assume the following (non-realistic) things:
- You must implement strong consistency for all updates at the same
server. Namely, writes at any client must be observed by a subsequent
reads at all other clients of that server. Note this only refers to
requests actually dispatched to the server; client writes are cached
locally and only written-through periodically or at a close. The
server can obviously only affect the consistency of operations that
are not served entirely by the client's local cache.
- Clients must see updates at other servers if the updates occurred
more than two seconds in the past.
- The file system should continue to be available despite the failure
of one or more servers. Any clients connected to a dead server may
not be able to access the file system, but clients connected to live
servers must continue to be able to use the file system and see any
changes from other live clients. Open files (and those closed for
less than two seconds) at a failed server may become
inconsistent across servers.
- Your server must work both with servers running on the same
machine (with different scratch directories) and different
Obviously, the last one is not likely to be true in any real network,
including the Active web cluster. However, the chances of a packet
drop on the Active web LAN are extremely small. The behavior of your
servers in the face of packet loss is undefined.
- Every server will be provided with the location of all other
servers at startup. No servers other than those specified on the
command line will join the network.
- A server may die, but will never return during the lifetime of the
system. In other words, you can assume fail-stop behavior.
- The maximum propagation delay between any two servers is 100 ms.
- The delay between any client and its server is negligible.
- The network never drops any packets.
We have provided an initial skeleton directory to get you started. It
is available as /home/classes/sp03/cse291d/labs/lab4.tgz on the
Active web machines. You should copy this file to your working
directory. Unlike the previous labs, this code will only build and
run on the Active Web cluster. The following sequence of commands
should extract the files and build the initial (non-distributed) executable,
% tar xzf lab4.tgz
% cd lab4
There are several files in the tarball. The ones you'll be interested
in are client.C and peer.C. client.C implements each of the NFS RPCs
and peer.C implements the UDP receive callback (which you'll need to
extend to handle your peer communication protocol). sfsusrv is
implemented using the libasync library, which provides all sorts of
asynchronous I/O and event handling routines. You shouldn't need to
know how it works to use it, but if you're curious, a
tutorial is available
on line. Before you run sfsusrv, you need to generate a
public/private key pair for use by the server. Run the following
% /home/classes/sp03/cse291d/sfs/build/bin/sfskey gen -KP sfs_host_key
Creating new key: sfs_host_key#1 (Rabin)
Key Label: sfs_host_key Press return
Note: seeing the following warning messages is normal when using sfskey:
/var/sfs/sockets/agent.sock: No such file or directory
sfskey: sfscd not running, limiting sources of entropy
You can pass several command line arguments to sfsusrv. The two you'll be interested in are:
You should edit the configuration file as appropriate. The first
thing to do would be to change the export line to point to a directory
you've created. You can then start up sfsusrv with the following
% ./sfsusrv -f ./sample_config
sfsusrv: peer: trowel.ucsd.edu:7777
sfsusrv: version 0.7.2, pid 69766
sfsusrv: Listening for peers on port: 7777
sfsusrv: Now exporting directory: /tmp/export-snoeren
sfsusrv: serving /firstname.lastname@example.org%2022,nd4uufan993fj8xdew6658rhdgugidsz
The last line shows the path name that you can use to access your
server from an SFS client on any machine except the one running
your server. (For technical reasons, you cannot mount a
file system on the a machine that is serving it.) Your path will be
different, as it depends both on the sfs_host_key you generated and
the hostname and port the server is running on. All of the Active Web
machines run an SFS client, so you could see the contents of your file
system by typing the following on any other Active web machine:
% ls /email@example.com%2022,nd4uufan993fj8xdew6658rhdgugidsz
This should list the contents of your filesystem (the directory you
specified in the sfsusrv configuration file). You can use this
directory just like any other filesystem. Note that our sample code
does not communicate any changes you make to peer servers.
Understanding NFS RPCs
Once you've gotten sfsusrv running, you can use it to trace NFS
RPCs. Kill your existing sfsusrv with control-C, and start a new one
with the ASRV_TRACE environment variable set to 10:
% env ASRV_TRACE=10 ./sfsusrv -f ./sample_config
Now, on another machine, browse the exported file system. You'll
have to use the /sfs/... pathname printed out by the current
instance of sfsusrv, since the port number will change each time. As
you browse, the server will print out a complete trace of all NFS
requests it receives. (Large structures may be truncated; if this is
ever a problem, try higher values than 10.) You can capture the
output of this trace by piping the output of sfsusrv to a file. For
example, in the Bourne shell (bash), you could use the following
% env ASRV_TRACE=10 ./sfsusrv -f ./sample_config 2>&1 | tee nfs.trace
After setting up sfsusrv to trace NFS traffic, run the following
commands (substituting the correct self-certifying pathname):
% cd /sfs/...
% rm junk
rm: junk: No such file or directory
% echo hello > junk
% cat junk
% cat junk
Now stop sfsusrv, and look at the RPCs in the trace file file. You can
see a summary of the RPCs using
grep serve nfs.trace,
though in order to see arguments and return values you'll have to look
at the whole file. Answering the following questions will help you
understand how NFS and sfsusrv interact.
- Which RPCs correspond to the creation of junk?
- Which to the first cat?
- Which to the second cat?
- Explain any differences between the RPCs caused by the two cat commands.
All code for this assignment must be written in a group of three or
four people. Every member of the group must submit identical
code. You are not allowed to look at anyone else's solutions (except
for your team members') or solutions to similar assignments you may
find for courses at UCSD or other institutions. You may discuss the
assignment with fellow students, but all code you submit must be
either yours alone, your team members', or code that was provided to you
as part of the assignment.
The turnin procedure is the same as for the previous labs. When you're
ready to submit your code, you can execute the following command:
% gmake turnin
which will create a file called lab4-turnin.tgz. You should copy the file into
your ~/291dturnin/ directory.
I will consider the first file in the 291dturnin directory after the deadline to be
your submission. Any lateness will be counted against each person
individually. I.e., if the assignment is 2 hours late, each person
will be docked 2 hours.