CSE 223B Labs


Welcome to Lab3. The goal of this lab is to take the bin storage that we implemented in Lab2 and make it fault-tolerant.

Lab3 can be submitted in teams of up to 3 people.

Get Your Repo Up-to-date

$ cd ~/gopath/src/trib
$ git branch lab3
$ git checkout lab3
$ git pull /classes/cse223b/sp14/labs/trib lab3
$ cd ~/gopath/src/triblab
$ git branch lab3
$ git checkout lab3
$ git pull /classes/cse223b/sp14/labs/triblab lab3

Not many changes, only some small things, should be painless.

It does not come with more unit tests (because it is not very easy to cleanly spawn and kill processes in unit tests). You need to test by yourself with bins-* tools.

System Scale and Failure Model

There could be up to 300 back-ends. Back-ends may join and leave at will, but you can assume that at any time there will be at least one back-end online (so that your system is functional). Your design is required to be fault-tolerant where if there are at least three back-ends online at all times, there should be no data loss. You can assume that each back-end join/leave event will have a time interval of 30 seconds in between, and this time duration will be enough for you to migrate storage.

There will be at least 1 and up to 10 keepers. Keepers may join and leave at will, but at any time there will be at least 1 keeper online. (If it is the only keeper, then it never goes offline.) Also, you can assume that each keeper join/leave event will have a time interval of 1 minute in between. When it says "leave" here, it assumes that the process of the back-end or the keeper is killed; everything in that process will be lost. Each time the keeper comes back at the same Index, although all states are lost, it will get a new Id field in the KeeperConfig structure.

For starting, we will start at least one back-end, and then at least one keeper. After the keeper sends true to the Ready channel, a front-end may now start and issue BinStorage calls.

Consistency Model

To tolerate failures, you have to save the data of each key on multiple places, and we will have a slightly relaxed consistency model.

Clock() and the key-value interface calles (Set(), Get() and Keys()) will remain the same semantics.

When concurrent ListAppend() happens, when calling ListGet(), the caller might see the values that are currently being added appear in arbitrary order. However, after all the concurrent ListAppend()'s successfully returned, ListGet() should always return the list with a consistent order.

Here is an example of an valid call and return sequence:

ListRemove() removes all matched values that are appended into the list in the past, and sets the n field propoerly. When (and only when) concurrent ListRemove() on the same key and value is called, it is okay to double count on n.

ListKeys() remains the same semantics.

Entry Functions

The entry functions will remain exactly the same as they are in Lab2, but only that the KeeperConfig might now have multiple keepers.

Additional Assumptions


Building Hints

For the ease of debugging, you can maintain some log messages (by using log package, or by writing to a TCP socket or a log file). However, for the convenience of grading, please turn them off by default when you turn in your code.

Also, try use a machine different than c08-11 for testing and debugging, this will lower your probability of running into a port collision.

Turning In

If you are submitting as a team, please create a file called teammates under the root of triblab repo that lists the login ids of the members of your team in each line.

Make sure that you have committed every piece of your code (and the teammates file) into the repository triblab. Then just type make turnin-lab3 under the root of your repository. It will generate a turnin.zip that contains everything in your gitt repo, and will then copy the zip file to a place where only the lab instructors can read.

Happy Lab3. :)

Last updated: Sat Apr 26 19:31:03 -0700 2014 [validate xhtml]