For this assignment, you are going to extend TritonTransfer to provide a fault-tolerant metadata service. To do this, you are going to rely on read and write quora to handle updates to file metadata.

Errata

  • 3/1: Added a section on filename versioning

  • 3/8: There is now a new GitHub invitation for this assignment

Redundant metadata servers

Instead of starting up a single metadata server, you will now start-up three (3) metadata servers, using ports specified in the config file. Note that you will need to extend the configuration file to include two additional specifications for the metadata servers.

Failure model

During testing, we will randomly kill one of your metadata servers in a “fail-stop” way. That means that the server will be killed, either via “Control-C”, by using the kill UNIX command, or by killing the process via Python’s subprocessor module. The result is that one of the metadata servers will stop functioning, and once it does, will not go back into service.

The BlockServer will remain unaffected, and won’t be killed during testing.

Metadata replication strategy

To support fault-tolerant operation, you will need to replicate TritonTransfer state to all three (3) metadata servers. To do this, you’ll use a gossip protocol where servers will periodically exchange amongst themselves metadata.

To increase performance, you should employ read and write quora so that only a subset of nodes need to be read from/updated before the client can continue. We have three servers total, and the size of the read quorum should be 2, and the size of the write quorum should also be 2. The client should be able to proceed knowing the write is committed to the metadata service after only two replicas have been written.

When client does a read operation, the blocklist it gets back should be the one that corresponds to the most recently committed write. In background, servers not in the write quorum should get the most recent metadata via a gossip protocol. Every 5 seconds a server should contact another server and pairwise exchange the most recent committed updates. This means that after 10 seconds, for an N=3 cluster, every server should be updated assuming no faults or failures.

Write pipeline

The client picks a server that is UP (let’s call it server A). The client writes an update to A, including the version number. Server A picks another up server (call it B) and pushes the write to server B. Once server B acknoweldges back to A that its write is successful, should A then commit the write locally and returns a success response back to the client. Note that this protocol works because we’re assuming that failures do not occur during a client operation, only between client operations.

Read pipeline

The client contacts any UP server (let’s call it A), and does the read. Server A contacts another UP server (call it B) and does a read, comparing the version numbers between A and B. Server A returns the most up-to-date blocklist back to the client.

Versioning

Files are associated with a version, which is the time (in seconds) of the file’s modification time. The metadata service should operate as follows:

  • Assume that a file with filename fn already exists on the metadata service with version number v1.
  • If a client tries to upload a file with name fn that has a version v0 less than or equal to v1, then the service should return a FILE_ALREADY_EXISTS error.
  • Otherwise, if the client tries to upload a file with name fn to the service, with a version v2 greater than v1, then the metadata service should overwrite/replace the blocklist associated with file fn with the new block list provided by the client.

Updates to the thrift file

To add this quorum support, you’ll need to modify or add to the Thrift files provided.

Testing

We will test your code using your client. The testing strategy is given in the Project 2 write-up.

Due date

This assignment is due the last day of class, as indicated on the course schedule. Commit your code to the same repository as your project (no need to accept a new GitHub invitation).