CSE 223B Labs

Lab 3

Welcome to Lab 3. The goal of this lab is to take the bin storage that we implemented in Lab 2 and make it fault-tolerant.

Lab 3 can be submitted in teams of up to 3 people.

Get Your Repo Up-to-date

Hopefully no changes have been made, but just in case, update your repository.

$ cd ~/gopath/src/trib
$ git pull origin master

This should be a painless update.

Note that we don't provide great unit tests to test fault tolerance (as it's hard to spawn and kill processes from within unit tests). Make sure you test this sufficiently using a testing mechanism of your own design.

System Scale and Failure Model

There could be up to 300 backends. Backends may join and leave at will, but you can assume that at any time there will be at least one backend online (so that your system is functional). Your design is required to be fault-tolerant where if there are at least three backends online at all times, there will be no data loss. You can assume that each backend join/leave event will have a time interval of at least 30 seconds in between, and this time duration will be enough for you to migrate storage.

There will be at least 1 and up to 10 keepers. Keepers may join and leave at will, but at any time there will be at least 1 keeper online. (Thus, if there is only one keeper, it will not go offline.) Also, you can assume that each keeper join/leave event will have a time interval of at least 1 minute in between. When a process 'leaves', assumee that the process is killed-- everything in that process will be lost, and it will not have an opportunity to clean up.

When keepers join, they join with the same Index as last time, although they've lost any other state they may have saved. Each keeper will receive a new Id in the KeeperConfig.

Initially, we will start at least one backend, and then at least one keeper. At that point, the keeper should send true to the Ready channel and a frontend should be able to issue BinStorage calls.

Consistency Model

To tolerate failures, you have to save the data of each key in multiple places. To keep things achievable, we have to slightly relax the consistency model, as follows.

Clock() and the key-value calls (Set(), Get() and Keys()) will keep the same semantics as before.

When concurrent ListAppend()s happen, calls to ListGet() might result in values that are currently being added, and may appear in arbitrary order. However, after all concurrent ListAppend()s return, ListGet() should always return the list with a consistent order.

Here is an example of an valid call and return sequence:

ListRemove() removes all matched values that are appended into the list in the past, and sets the n field properly. When (and only when) concurrent ListRemove() on the same key and value is called, it is okay to 'double count' elements being removed.

ListKeys() keeps the same semantics.

Entry Functions

The entry functions will remain exactly the same as they are in Lab 2. The only thing that will change is that there may be multiple keepers listed in the KeeperConfig.

Additional Assumptions


Building Hints

For the ease of debugging, you can maintain some log messages (by using log package, or by writing to a TCP socket or a log file). However, for the convenience of grading, please turn them off by default when you turn in your code.

Also, try to distribute yourselves evenly across the lab machines. If everyone uses vm143, it'll be unhappy.


Similar to in Lab 2, please include a readme file. See the description in Lab 2 for more details.

Turning In

If you are submitting as a team, please create a file called teammates under the root of triblab repo that lists the login ids of the members of your team, each on its own line.

Make sure that you have committed every piece of your code (including the teammates file) into the triblab repository. Then just type make turnin-lab3 under the root of your repository.

Happy Lab 3. :-)

Last updated: Thu Apr 23 14:58:38 -0700 2015 [validate xhtml]