In this project, you will create a system that implements the RAFT Consensus Algorithm to provide a basic key-value storage service to clients. More importantly, you will be running your implementation at scale and measuring its performance. At the end of the project, we’ll expect a 5-6 page technical report containing a brief explanation of your experimental methodology and all of the results of the experiments you performed. We will also be meeting with your group to check your progress, during which we will be expecting you to show a brief demo of your implementation.
We will be providing AWS credits that will allow you to run your system at a larger scale than your local machine. The demo should show your system running on AWS at scale and also exhibit how your system performs in a number of unexpected conditions (network partitions, node failures, etc).
We expect the technical report to be a professionally presentable document that you would feel comfortable submitting to an academic conference. The purpose of the report is not only to show off how you measured your system, but to sufficiently describe and interpret your results in a meaningful way.
For this project, we expect you to work in groups of 2-3 students. Groups should be established well before the first project checkpoint (see below). If you find yourself without a group after the first week or so of class, please email us so that we can find a group for you. We do not recommend tackling this project alone; the effort required to both implement RAFT and measure it sufficiently is non-trivial.
The project has three checkpoints, which are meant to help ensure that everyone is on track to completing the RAFT implementation with enough time to take sufficient measurements for the technical report. For the first two checkpoints, we’ll expect a short writeup of your progress along with access to your currently committed code. If your group wishes, we can also schedule a meeting to review the checkpoint and discuss any issues your group may be having.
We will be reviewing your code at each checkpoint to ensure progress. We recommend hosting your code in a GitHub repository; this will provide us and your group with a manageable interface to check out your code. GitHub has recently started giving unlimited private repositories to all users, too!
If you’d like to propose an alternative project with a more direct research focus, feel free to let us know! We’re absolutely open to supporting a new research idea in the distributed systems space. An alternative project can replace both the technical report and the research paper, provided the idea around the project is sufficiently defined to permit 10 weeks of work on it.
Doing an alternative project will require you to have at least a basic understanding of distributed systems research in order to examine what new ideas your group can bring into that space. Alternative project proposals should be attempted as a serious research option with the hope of publishing at a major technical conference; at the bare minimum we will be requesting that all alternative projects are uploaded into arXiv.
Anything that’s compatible with gRPC, which we’re requiring for this project. You can even use multiple languages for different components, if you’d like. Feel free to use other languages for helper scripts/setup/build/etc. as well. Do consider your choice of programming language when reporting on results.
Nope! Feel free to use anything you’d like that would help you deploy your service. Just be sure to account for any additional effects this might have when measuring your system’s performance.
With that said, we do only have a limited amount of AWS credits available. If you start spending a ton of money on expensive VMs, you may end up running out of credits.
Additional libraries are fine, so long as they’re not implementing a component of RAFT for you (leader election, log storage, etc.). JSON/YAML libraries to read configuration files are fine, as are anything to manage AWS commands, run testing frameworks, and gather/plot runtime statistics. Feel free to ask us directly if you’re not sure.
No. All of your servers should be run across multiple VMs in the same region. Not only is running across regions more difficult, it will induce strange performance measurements that will be difficult to verify and explain.