ModifyFile/DeleteFile should only be applied when the given version number is exactly one higher than the version stored in the metadatastore.
The Client should only ever invoke operations on the leader. The leader can handle ReadFile requests by itself, and can also properly handle ModifyFile and DeleteFile operations that are not valid. For example, if the client tries a modifyfile operation and there are missing blocks, or if a client tries a modifyfile operation that is the wrong version, then the leader can just reject the request. Only when a request is valid (all blocks are present and the version number is correct) does it need to invoke 2PC on the followers.
Note that the followers (and leaders) need to always implement ReadFile, even when they are crashed. ReadFile should return the most up-to-date committed value of the filesystem. It should not include any logged (but not yet committed) updates.
For Project 2, we wanted to clarify the comments on the ReadFile() RPC in the provided protobuf file, as we’ve received several questions and heard that some people are confused regarding its functionality.
Per the text in the protobuf file, the client must supply the “filename” argument. If this is not provided, the server should return that the file does not exist by setting the version number to 0.
Every MetadataStore should correctly respond to a ReadFile RPC at any time, even if it is a replica or is in a crashed state. A crashed replica should only respond to three RPCs: Restore(), ReadFile(), and isCrashed(). The comments for the Crash() RPC are incorrect, as they state that a crashed MetadataStore should only respond to Restore().
Thus, ReadFile() will always return a response, and will return the version number and blocklist for the file (with the version number set to 0 if the file doesn’t exist).
Python demo code
If you are working on python environment for Project 2, we have attached python version of blockstoredemo that was part of video:
I removed the part of the project involving Client optimizations (download and upload). This feature is now optional and not part of the final grade.
The project spec’s grading rubric has been updated.
Some notes on implementing part 2:
When a client sends a command to the leader, the leader is going to log that command in its local log, then issue a two-phase commit operation to its followers. When a majority of those followers approve of the update, the leader can commit the transaction locally, and then respond back to the client. After the leader responds back to the client, it is going to need to tell the followers that the transaction was committed. It is fine to immediately call into them with the updated commit index.
Now, what happens if a follower is in a crashed state? The leader should attempt to bring it up to date every 500ms, meaning that every half second the leader should call into the follower with updated information.
Some hints on testing part 2
To test part 2, we are (in part) going to do the following:
- Start up your servers
- Update a number of files
- We will then “crash” one or more of the followers (but never more than half)
- We’ll then continue to update files
- Your service should continue to work while the followers are crashed, so that as far as the client is concerned, nothing appears to have failed
- During the time that one or more of your followers is crashed, we’ll call into its ReadFile() api call to ensure that its state is not being updated. In other words, we’ll ensure that it is falling behind the rest of the system
- We’ll then “uncrash” the follower(s), and wait e.g., 5 seconds. Then we’ll check to make sure that those followers have “caught up” to the rest of the system and have the updated information
- This may happen multiple times.
I hope that this testing strategy can help you exercise your code.
Storing replicated logs
You do not need to write out the logs to disk–it is fine to keep them in memory. This is true for the followers and for the leader.
I’ve put together a four-part tutorial on how to get started with SurfStore, including how to implement the BlockStore service. It is located at this link
The link to accept the GitHub classrom repo is here: https://classroom.github.com/g/KNN3awwz
The starter code (for Java) is now available here: https://github.com/gmporter/cse124-project2. We’ll be aiming to post a python version soon.
Submission instructions will come a bit later.
You are going to implement a BlockStore service, one or more Metadata services, and the Client. We are going to have our own version of each of these services that we’ll use to test your project. For this reason, make sure to stick to the .proto file we provide, so that our services can inter-operate with yours. You may extend these rpc calls and the messages, and in fact, you will need to for part 2), but please don’t change the interfaces provided.
Please start early!
Project 2 is due on Friday Dec 8 at 5pm.