Birrell and Nelson
Read the original paper on RPCs:
Andrew D. Birrell and Bruce Jay Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst. 2, 1 (February 1984), 39-59. DOI=http://dx.doi.org/10.1145/2080.357392
- Write a brief summary of this paper, including your opinion of the strengths and weaknesses of the design. Is the design presented relevant for today’s distributed systems? Why or why not? If you are in a group of two, please submit two (different) write-ups. You can submit both write-ups in the same submission. There is a file called papersummary.txt. Please add your writeup(s) to this file.
As you read the paper, keep the following questions in mind. Note that you don’t need to answer them in your write-up.
- How clients rendezvous with servers, and how client implementations locate and bind to RPC implementations on servers
- How data types on the client/server are translated to the corresponding implementation on the remote endpoint
- The transport protocol used to communicate between the client and the server. How does their design handle network failures, packet reorder, and packet losses?
- How does their design handle failures of the client or server?
- What are the security implications of adopting an RPC design?
If you’d like some online training on Python, UCSD has partnered with Pluralsight to offer an optional online course that teaches the fundamentals of Python. Visit https://app.pluralsight.com/paths/skills/python and click the “Log in” link on the upper-right corner of the page. Under the login area, you’ll find a link that says “Sign in with company or school”. Click that, and log in with your UCSD single sign-on credentials. You should then have access to the course.
In this project you are going to create a cloud-based file storage service called SurfStore. SurfStore is a networked file storage application that supports four basic commands:
- Create a file
- Read the contents of a file
- Change the contents of a file
- Delete a file
Multiple clients can concurrently connect to the SurfStore service to access a common, shared set of files. Clients accessing SurfStore “see” a consistent set of updates to files, but SurfStore does not offer any guarantees about operations across files, meaning that it does not support multi-file transactions (such as atomic move).
The SurfStore service is composed of the following two sub-services:
BlockStore: The content of each file in SurfStore is divided up into chunks, or blocks, each of which has a unique identifier. The BlockStore service stores these blocks, and when given an identifier, retrieves and returns the appropriate block.
MetadataStore: The MetadataStore service holds the mapping of filenames/paths to blocks.
Additionally, you will need to implement a client that can support the four basic commands listed above.
We now describe the service in more detail.
Blocks, hashes, and hashlists
A file in SurfStore is broken into an ordered sequence of one or more blocks. Each block is of uniform size (4KB), except for the last block in the file, which may be smaller than 4KB (but must be at least 1 byte large). As an example, consider the following file:
The file ‘MyFile.mp4’ is 14,437 bytes long, and the block size is 4KB. The file is broken into blocks b0, b1, b2, and b3 (which is only 2,149 bytes long). For each block, a hash value is generated using the SHA-256 hash function. So for MyFile.mp4, those hashes will be denoted as [h0, h1, h2, h3] in the same order as the blocks. This set of hash values, in order, represents the file, and is referred to as the hashlist. Note that if you are given a block, you can compute its hash by applying the SHA-256 hash function to the block. This also means that if you change data in a block the hash value will change as a result. To update a file, you change a subset of the bytes in the file, and recompute the hashlist. Depending on the modification, at least one, but perhaps all, of the hash values in the hashlist will change.
Generating SHA-256 values in Python
As an example of converting some bytes using the SHA-256 hash value (encoded using base-64) in Python:
$ python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import hashlib >>> hashlib.sha256(b'\x00\x01\x02').hexdigest() 'ae4b3280e56e2faf83f414a6e3dabe9d5fbe18976544c05fed121accb85b53fc' >>>
Files and filenames
Files in SurfStore are denoted by filenames, which are represented as strings. For example “MyDog.jpg”, “WinterVacation.mp4”, and “Expenses.txt” are all example of filenames. SurfStore doesn’t have a concept of a directory or directory heirarchy–filenames are just strings. For this reason, filenames can only be compared for equality or inequality, and there are no “cd” or “mkdir” commands. Filenames are case sensitive, meaning that “Myfile.jpg” is different than “myfile.jpg”.
Note that when uploading files, you should use the string that follows the last
/ as filename.
/ is not present, the entire string will be the filename. For e.g -
|Upload File Path||Filename|
Each file/filename is associated with a version, which is a monotonically increasing non-negative integer. The version is incremented any time the file is created, modified, or deleted. The purpose of the version is so that clients can detect when they have an out-of-date view of the file hierarchy.
For example, imagine that Client 1 wants to update a spreadsheet file that tracks conference room reservations. Ideally, they would perform the following actions:
However, another client might be concurrently modifying this file as well. In reality, the order of operations might be:
As you can see, Client 1 overwrote the change that client 2 made without realizing it. We can solve this problem with file versions. Every time a file is modified, its version number is incremented. SurfStore only records modifications to files if the version is one larger than the currently recorded version. Let’s see what would happen in the two-client case:
To delete a file, the MetadataStore service records a versioned “tombstone” update. This update simply indicates that the file has been deleted. In this way, deletion events also require version numbers, which prevents race conditions that can occur when one client deletes a file concurrently with another client deleting that file. Note that this means that a deleted file must be recreated before it can be read by a client again. If a client tries to delete a file that has already been deleted, that is fine–just handle the version numbers of these tombstone updates appropriately.
Also note that there is a difference between a deleted file and a file that has not been created so far. Trying to read both types of file will lead to a “File Not Found” error but a deleted file has a positive value for the version number, whereas a file that does not exist has a version = 0.
SurfStore consists of three types of processes: client processes, one or more BlockStore process, and a single Metadata process. These processes never fail, never crash, and never lose connectivity with each other. If you want your service to handle these cases, I highly recommend taking CSE 223B!
A client is a program that interacts with SurfStore. It is used to create, modify, read, and delete files. Your client will call the various file modification/creation/deletion RPC calls. We will be testing your service with our own client, and your client with instrumented versions of our service.
The client implements the following functionalities (the config files are described in a later section):
- upload : Reads the local file, creates a set of hashed blocks and uploads them onto the MetadataStore (and potentially the BlockStore if they were not already present there).
- Example 1:
$ client myconfig.txt upload /path/to/file.dat
- Example 2:
$ client myconfig.txt upload ../relative_dir/myfile.txt
- Example 1:
- download : Downloads a file from SurfStore. Ensures not to download unnecessary blocks.
- Example 1:
$ client myconfig.txt download filename.txt /path/to/where/to/store/it
- Example 2:
$ client myconfig.txt download filename.txt .
- Example 1:
- delete : Signals the MetadataStore to delete a file.
$ client myconfig.txt delete filename.txt
We will be testing your client by calling these functions. Make sure that your client matches the same syntax as described above.
$ client /etc/myconfig.txt upload /home/aturing/myvacation.mp4 $ mkdir /home/aturing/downloads $ client /etc/myconfig.txt download myvacation.mp4 /home/aturing/downloads $ diff /home/aturing/myvacation.mp4 /home/aturing/downloads/myvacation.mp4 (indicates the files are identical) $ cp /home/aturing/springbreak.mp4 /home/aturing/myvacation.mp4 $ client /etc/myconfig.txt upload /home/aturing/myvacation.mp4 $ client /etc/myconfig.txt delete myvacation.mp4 $ client /etc/myconfig.txt download myvacation.mp4 /tmp (indicates an error because the file isn't there)
The BlockStore service is an in-memory data store that stores blocks of data, indexed by the hash value. Thus it is a key-value store. It supports basic get() and put() operations. It does not need to support deleting blocks of data–we just let unused blocks remain in the store. The BlockStore service only knows about blocks–it doesn’t know anything about how blocks relate to files.
The service implements the following API:
- StoreBlock(h,b): Stores block b in the key-value store, indexed by hash value h
- b = GetBlock(h): Retrieves a block indexed by hash value h
- True/False = HasBlock(h): Signals whether block indexed by h exists in the BlockStore service
The MetadataStore process maintains the mapping of filenames to hashlists. All metadata is stored in memory, and no database systems or files will be used to maintain the data. When we test your project, we will always start from a “clean slate” in which there are no files in the system.
The service implements the following API:
- (v,hl) = ReadFile(f): Reads the file with filename f, returning the most up-to-date version number v, and the corresponding hashlist hl. If the file does not exist, v will be 0.
- ModifyFile(f,v,hl): Modifies file f so that it now contains the contents refered to by the hashlist hl. The version provided, v, must be exactly one larger than the current version that the MetadataStore maintains.
- DeleteFile(f,v): Deletes file f. Like ModifyFile(), the provided version number v must be one bigger than the most up-date-date version.
To create a file that has never existed, use the ModifyFile() API call with v set to 1. To create a file that was previously deleted, use ModifyFile with a version number that is one larger than the “tombstone” update.
Basic operating theory
When a client wants to create a new file, it first contacts the MetadataStore to see if the file already exists (or existed in the past, and has since been deleted). If so, it notes the previous version number.
The client then reads its local copy of the file and splits it into blocks, as described above. It then computes the hash values of each of the blocks to form a hashlist. It then contacts the MetadataStore leader and invokes the ModifyFile() API, passing it the filename, updated version number, and hashlist.
Clients are also responsible for uploading the blocks of the file to the BlockStore service. A naive implementation would do this after uploading the hashlist via ModifyFile().
But this leads to a potential race condition: Consider the scenario -
- Client 1 calls the metadata store to modify a file and provides a hashlist
- The metadata store saves the mapping for this file and hashlist after checking that the version is correct and returns the list of missing blocks to Client 1
- Client 1 starts uploading the missing blocks to the block store.
- It is possible that another client, Client 2 tries to read the same file and gets the updated hashlist.
- When client 2 contacts the block store, Client 1 might still be uploading the blocks or it might have crashed before it could upload the blocks. Do you see the problem here?
To prevent these issues, the protocol you’re going to use works as follows. When the client does a ModifyFile() operation, the MetadataStore is going to query the BlockStore(s) for each of the hash values in the hashlist, to see which, if any, of the blocks are already in the BlockStore. If any blocks are missing from the BlockStore, the MetadataStore will reply back to the client with a list of missing blocks. The MetadataStore will not create or modify the filename to hashlist mapping if any blocks are not present in the BlockStore. Only when all the blocks are in the BlockStore will the MetadataStore signal a success return value to the client’s ModifyFile() operation, and from then on the new file version is available to any clients that want to download it.
As an example:
To download a file, the client invokes the ReadFile() API call on the MetadataStore, passing in the filename. The MetadataStore simply returns the version and hashlist to the client. The client then downloads the blocks from the BlockStore to form the complete file. As an example:
When a file is deleted, we create tombstone update in the metadata store, but the blocks in blockstore are not removed. This has the advantage that a client that is downloading the blocks when another client deletes the file is not affected by the delete operation.
For this project, you will use a configuration file describing the cluster details, with the following format:
B: 1 metadata: <host>:<port> block1: <host>:<port>
The above deployment has a single BlockStore server.
B: 2 metadata: <host>:<port> block1: <host>:<port> block2: <host>:<port>
This example has two BlockStore servers.
The initial line
Bdefines the number of BlockStore servers.
The ‘metadata’ line specifies the host and port number of your metadata server.
The ‘block’ line specifies the host and port numbers of your blockstore server(s). Note the ‘1’, ‘2’, etc after the word block to indicate the ports for the different instances of the service. Your service must support up to 9 block store instances.
This config file will be available to the client and servers when they are started. This configuration file helps the server or client know the cluster information and also how many blockstore servers are present in the service. The configuration file we provide will always be valid.
The client will perform one or more operations before exiting.
Note that before downloading any blocks from the BlockStore, the client should check whether any of those blocks exist locally first. It only needs to check whether that block already is present in the destination directory. For example, if the client is downloading a file to the /home/aturing/downloads directory, then the block protocol only needs to look for local blocks in /home/aturing/downloads, not everywhere else (and not in any subdirectories either, just that one directory).
You can print out log and status information to
stdout, please either print:
OK: if the operation succeeded
Not Found: if the client tried to download a file that was not found on the server, or tried to upload a file which does not exist on the local machine.
Note that there is no type of “old version” message, because if the client tries to update the file to version v and another client updates the metadata store first, then the original client should simply re-request the most up-to-date version number and try again.
If your deployment has more than one blockstore server, you may be wondering how the client and the metadata store choose locations for data blocks. In this assignment, you are going to choose a simple, static assignment of blocks to blockstore servers using the following algorithm:
def findServer(h): return int(h,16) % numBlockStores
Where h is a hex-encoded SHA-256 hash value.
Where h is a hex-encoded SHA-1 hash value (The value that you generated using the function
Starting rpyc server and connecting to it
- Starting a server is like running any other python command -
- The server should be implemented as a class that inherits the
rpyc.Serviceclass and then implements the rpc methods.
- The class should then provide a method to start the rpc server. You can use the ThreadedServer class provided -
if __name__ == '__main__': from rpyc.utils.server import ThreadedServer server = ThreadedServer(Server(), port=5000) server.start()
To connect to a server from a client use the following syntax -
import rpyc conn = rpyc.connect('localhost', 5000)
For more information take a look at the Van Steen textbook and the rpyc documentation. >>>>>>> 0e83decc8fa09b3ad3792ef0ffdd1d5154d28a9f
- BlockStore functionality (20%)
- Storing, retrieving, and testing for the existence of blocks
- Client functionality (40%)
- Correctly uploading and downloading files to the store
- Correctly implementing the optimization where unnecessary blocks are not downloaded or uploaded to the BlockStore
- Metadata store: Part 1 (40%)
- Properly supporting versions across concurrent clients
- Properly supporting reading and modifying files. In particular, this includes the “missing blocks” protocol in which the Metadata store calls into the Block Store and returns the correct set of missing blocks back to the client. When duplicate files, or files that contain duplicate blocks, are uploaded, the client should not re-upload blocks that already exist.
- Properly supporting delete file
- Properly supporting the case where a new file is created with the same name as a previously deleted file
The starter code is available via this GitHub invitation.
Submitting your work
Log into gradescope.com and upload your solution as a zipped folder of your repository or upload a submission using the github submission method. If you use the github method, make sure you submit after you push a commit as the changes will not automatically reflect on gradescope. Don’t forget to add your paper summary to papersummary.txt.
Friday Nov 9, 5pm
This assignment is worth 10 points