The goal of project 5 is to build a simple "DropBox"-like file sharing system that relies on RPC.
General learning objectives:
Specific learning objectives:
This project should be done individually. The due date is listed on the course syllabus.
In this project, you will be creating a simplified version of DropBox, called TritonTransfer.
TritonTransfer consists of a command-line client and a command-line server. The client can upload or download files from the server, and the server stores the file(s).$ tt-client <server_name> <server_port> upload <filename>
Return: The client will print either "OK" (if the file was successfully uploaded), or "ERROR" (if there was some kind of error uploading the file).
To download a file, the client is invoked similarly:
$ tt-client <server_name> <server_port> download <filename> <download_dir>
$ tt-server <server_port> <file_dir>
We will only invoke your client and server with valid arguments. All directories will exist, be readable/writable, the ports/arguments will be valid, etc.
Your server does not need to support subdirectories. For example, if you invoke:
$ tt-client localhost 9001 upload /var/lib/mydir/myfile.txtThen you should be able to download that file via:
$ tt-client localhost 9001 download myfile.txt /tmpAnd the file will be downloaded and stored in /tmp
NEW:We will call your client with your server. We will not use our own client or our own server
Any uploaded or downloaded file is broken into fixed-size blocks of size 16 KB (except for the last block, which can be smaller than 16KB). A hash value is computed over Each block using SHA-256. We say that a file f consists of blocks <b1,b2,...,bn>, with hashes <h1,h2,...,hn>. As we will now see, the file is transferred in these fixed-sized blocks
If you use Python, you can generate hashes of data using the hashlib.sha256() command: https://docs.python.org/2/library/hashlib.html#module-hashlib
Blocks should only be transferred if they aren't already on the client/server. Blocks are identified by hash functions. So, for example, if two files:
file1: h1,h2,h3,h4
and
file2: h5,h6,h1,h8
are to be transferred, only 7 blocks should be moved, not 8. For downloading, you only need to look at other files in the download directory for common hashes (to optimize the download).
In the above example, the client would invoke the following RPC call on the server:
ret = uploadFile("catpicture.jpg", [h1,h2,h3,h4]);
Return value: uploadFile returns a list containing the hashes of blocks it still needs, if any. If the returned list is empty, that means that the file has been successfully uploaded. If the list is not empty, then additional blocks must be uploaded, using the next RPC call
To upload a block b, the client invokes an uploadBlock RPC call, which takes two arguments:
Return value: uploadBlock returns 'OK' if the block was stored successfully, or 'ERROR' if there was an error. Errors could occur if the block is longer than 16KB, or if the hash value doesn't match the hash of the actual block itself.
In the above example, the client would invoke the following RPC call on the server:
ret = downloadFile("catpicture.jpg");
Return value: downloadFile returns a list containing the hashes of blocks making up the file.
To download a block b, the client invokes a downloadBlock RPC call, which takes one argument:
Return value: downloadBlock returns the contents of the block if it stored on the server, or 'ERROR' if the block does not exist on the server.
Note (NEW): you can adapt your protocol/API to support either a block or an error condition. For example, you can return a pair, where the first element of the pair is either OK or ERROR, and the value is either NULL or the block contents. Or you can use an exception. That is up to you.
You may be asking why we associate hashes with blocks. The reason is so that if the server already has a popular file, the client need not transfer any data. If many users upload the same large file, for example a popular movie, then only the first user needs to transfer the blocks. For example:
In this project, you will define your RPC service, your RPC APIs, and any RPC data types inside of a Thrift IDL file (called TritonTransfer.thrift). You can use any language you want for your client and server, assuming that Thrift supports the language.
You will submit this project to your CSE 124 GitHub account, including your client code, your server code, and your thrift IDL. You should run thrift on your IDL to generate the stubs for your language, and commit those stubs to your repository.
<github_id>_cse124/ |-- project |-- proj5 |-- thrift |-- TritonTransfer.thrift |-- client |-- Makefile |-- gen-py (or gen-java, gen-cpp, etc) |-- (various source code files) |-- server |-- Makefile |-- gen-py (or gen-java, gen-cpp, etc) |-- (various source code files)We should be able to check out your code, go into the client directory and type 'make' and have your client built. We should then be able to go into the server directory, type 'make', and have your server built.
The goal of project 6 is to extend TritonTransfer to support a peer-to-peer delivery mode.
General learning objectives:
Specific learning objectives:
This project should be done individually. The due date is listed on the course syllabus.
In this extension to project 5, there is now going to be two kinds of servers--one metadata server, and one or more block servers. The metadata server keeps a list of all the hashes that make up a file. Block servers store and serve out blocks of data. Clients issue uploadFile and downloadFile calls to the metadata server, yet issue uploadBlock and downloadBlock calls to the block servers. NEW You may not send data blocks through the metadata server--all transfers of the data blocks must go between the clients and the data block servers. Unlike in project 5, this peer-to-peer version does not need to store any persistent data--all file metadata and data blocks are to be kept in memory.
$ tt-md-server <server_port>
$ tt-block-server <server_port> <metadata_server> <metadata_port>
In this project, you are going to start up three block servers (can be on separate servers, or all on the same server running on different TCP ports). Each block of data should be stored on two block servers. That way, if one block server were to fail, you would not lose any data.
You have several choices for how to locate blocks of data:
We will only invoke your client and server with valid arguments. All directories will exist, be readable/writable, the ports/arguments will be valid, etc.
Your server does not need to support subdirectories. For example, if you invoke:
$ tt-client localhost 9001 upload /var/lib/mydir/myfile.txtThen you should be able to download that file via:
$ tt-client localhost 9001 download myfile.txt /tmpAnd the file will be downloaded and stored in /tmp
In this project, you will define your RPC service, your RPC APIs, and any RPC data types inside of a Thrift IDL file (called TritonTransfer.thrift). You can use any language you want for your client and server, assuming that Thrift supports the language.
We will carry out a failure test in which we upload a file to TritonTransfer-p2p, then kill one of the three block servers. Your client should be able to download the file, even with one of the block servers out of commission
You will submit this project to your CSE 124 GitHub account, including your client code, your server code, and your thrift IDL. You should run thrift on your IDL to generate the stubs for your language, and commit those stubs to your repository.
<github_id>_cse124/ |-- project |-- proj6 |-- thrift |-- TritonTransfer.thrift |-- client |-- Makefile |-- gen-py (or gen-java, gen-cpp, etc) |-- (various source code files) |-- server |-- Makefile |-- gen-py (or gen-java, gen-cpp, etc) |-- (various source code files)
We should be able to check out your code, go into the client directory and type 'make' and have your client built. We should then be able to go into the server directory, type 'make', and have your server built.
Note that you may certainly reuse code between projects 5 and 6, but please commit code separately.