CSE 124 : Winter 2016 : Projects 5 and 6

Note: Project 6 builds upon project 5--they are designed to be done together, and the due dates have been updated appropriately.

Project 5: Client-server TritonTransfer with RPCs

Project overview

The goal of project 5 is to build a simple "DropBox"-like file sharing system that relies on RPC.

General learning objectives:

Specific learning objectives:

This project should be done individually. The due date is listed on the course syllabus.

TritonTransfer: Project description

In this project, you will be creating a simplified version of DropBox, called TritonTransfer.

TritonTransfer consists of a command-line client and a command-line server. The client can upload or download files from the server, and the server stores the file(s).

Client interface

To upload a file, the client is invoked as follows:
$ tt-client <server_name> <server_port> upload <filename>
  1. server_name: The hostname/IP of the server
  2. server_port: The port on the server
  3. upload: A static token representing an upload operation
  4. filename: The file to upload (either an absolute or relative path)

Return: The client will print either "OK" (if the file was successfully uploaded), or "ERROR" (if there was some kind of error uploading the file).

To download a file, the client is invoked similarly:

$ tt-client <server_name> <server_port> download <filename> <download_dir>
  1. server_name: The hostname/IP of the server
  2. server_port: The port on the server
  3. download: A static token representing a download operation
  4. filename: The file to download
  5. download_dir: The directory to store the downloaded file within
Return: The client will print either "OK" (if the file was successfully downloaded), or "ERROR" (if there was some kind of error downloading the file). Note (NEW): If we call your tt-client multiple times to download data, we will always call your client with the same download directory.

Server interface

Your server is invoked as follows:
$ tt-server <server_port> <file_dir>
  1. server_port: The port on the server
  2. file_dir: The location on the server used to store uploaded files, and from which to serve out downloaded files to clients. NEW: you can optionally store file metadata and blocks in memory. This only applies to the server. If you choose to store file metadata and blocks in memory, you should ignore the file_dir parameter.

Implementation Notes

We will only invoke your client and server with valid arguments. All directories will exist, be readable/writable, the ports/arguments will be valid, etc.

Your server does not need to support subdirectories. For example, if you invoke:

$ tt-client localhost 9001 upload /var/lib/mydir/myfile.txt

Then you should be able to download that file via:

$ tt-client localhost 9001 download myfile.txt /tmp

And the file will be downloaded and stored in /tmp

NEW:We will call your client with your server. We will not use our own client or our own server

Blocks and hashes

Any uploaded or downloaded file is broken into fixed-size blocks of size 16 KB (except for the last block, which can be smaller than 16KB). A hash value is computed over Each block using SHA-256. We say that a file f consists of blocks <b1,b2,...,bn>, with hashes <h1,h2,...,hn>. As we will now see, the file is transferred in these fixed-sized blocks

If you use Python, you can generate hashes of data using the hashlib.sha256() command: https://docs.python.org/2/library/hashlib.html#module-hashlib

Blocks should only be transferred if they aren't already on the client/server. Blocks are identified by hash functions. So, for example, if two files:

file1: h1,h2,h3,h4

and

file2: h5,h6,h1,h8

are to be transferred, only 7 blocks should be moved, not 8. For downloading, you only need to look at other files in the download directory for common hashes (to optimize the download).

TritonTransfer upload protocol

To upload a file f, the client invokes an uploadFile RPC call, which takes two arguments:
  1. filename: the name of the file being uploaded
  2. hashlist: a list of block hashes making up the file

In the above example, the client would invoke the following RPC call on the server:

ret = uploadFile("catpicture.jpg", [h1,h2,h3,h4]);

Return value: uploadFile returns a list containing the hashes of blocks it still needs, if any. If the returned list is empty, that means that the file has been successfully uploaded. If the list is not empty, then additional blocks must be uploaded, using the next RPC call

To upload a block b, the client invokes an uploadBlock RPC call, which takes two arguments:

  1. hash: the hash value identifying the block
  2. block: the byte array making up the block

Return value: uploadBlock returns 'OK' if the block was stored successfully, or 'ERROR' if there was an error. Errors could occur if the block is longer than 16KB, or if the hash value doesn't match the hash of the actual block itself.

Upload protocol example

TritonTransfer download protocol

To download a file f, the client invokes a downloadFile RPC call, which takes ones argument:
  1. filename: the name of the file being downloaded

In the above example, the client would invoke the following RPC call on the server:

ret = downloadFile("catpicture.jpg");

Return value: downloadFile returns a list containing the hashes of blocks making up the file.

To download a block b, the client invokes a downloadBlock RPC call, which takes one argument:

  1. hash: the hash value identifying the block

Return value: downloadBlock returns the contents of the block if it stored on the server, or 'ERROR' if the block does not exist on the server.

Note (NEW): you can adapt your protocol/API to support either a block or an error condition. For example, you can return a pair, where the first element of the pair is either OK or ERROR, and the value is either NULL or the block contents. Or you can use an exception. That is up to you.

Download protocol example

Why these hashes?

You may be asking why we associate hashes with blocks. The reason is so that if the server already has a popular file, the client need not transfer any data. If many users upload the same large file, for example a popular movie, then only the first user needs to transfer the blocks. For example:

Additional information

Grading

In this project, you will define your RPC service, your RPC APIs, and any RPC data types inside of a Thrift IDL file (called TritonTransfer.thrift). You can use any language you want for your client and server, assuming that Thrift supports the language.

Project 5 Submission guidelines

You will submit this project to your CSE 124 GitHub account, including your client code, your server code, and your thrift IDL. You should run thrift on your IDL to generate the stubs for your language, and commit those stubs to your repository.

<github_id>_cse124/
|-- project
    |-- proj5
        |-- thrift
            |-- TritonTransfer.thrift
        |-- client
            |-- Makefile
            |-- gen-py (or gen-java, gen-cpp, etc)
            |-- (various source code files)
        |-- server
            |-- Makefile
            |-- gen-py (or gen-java, gen-cpp, etc)
            |-- (various source code files)

We should be able to check out your code, go into the client directory and type 'make' and have your client built. We should then be able to go into the server directory, type 'make', and have your server built.

Revisions

  1. Mar 2: Various revisions noted by NEW
  2. Feb 29: When looking for common hashes to download, you only need to look in the download directory
  3. Feb 28: Clarified that blocks are only to be transferred if they aren't already at the receiver.
  4. Feb 24: Clarified point about subdirectories
  5. Feb 24: Initial version

Project 6: Peer-to-peer TritonTransfer with RPCs

Project overview

The goal of project 6 is to extend TritonTransfer to support a peer-to-peer delivery mode.

General learning objectives:

Specific learning objectives:

This project should be done individually. The due date is listed on the course syllabus.

TritonTransfer-p2p: Project description

In this extension to project 5, there is now going to be two kinds of servers--one metadata server, and one or more block servers. The metadata server keeps a list of all the hashes that make up a file. Block servers store and serve out blocks of data. Clients issue uploadFile and downloadFile calls to the metadata server, yet issue uploadBlock and downloadBlock calls to the block servers. NEW You may not send data blocks through the metadata server--all transfers of the data blocks must go between the clients and the data block servers. Unlike in project 5, this peer-to-peer version does not need to store any persistent data--all file metadata and data blocks are to be kept in memory.

Client interface

The client interface is the same as in project 5.

Metadata server interface

The metadata server is invoked as follows:
$ tt-md-server <server_port>
  1. server_port: The port on the server

Block server interface

The block server is invoked as follows:
$ tt-block-server <server_port> <metadata_server> <metadata_port>
  1. server_port: The port to listen for incoming connections
  2. metadata_server: The hostname/ip address of the metadata server.
  3. metadata_port: The port of the metadata server.

Locating blocks

In this project, you are going to start up three block servers (can be on separate servers, or all on the same server running on different TCP ports). Each block of data should be stored on two block servers. That way, if one block server were to fail, you would not lose any data.

You have several choices for how to locate blocks of data:

Implementation Notes

We will only invoke your client and server with valid arguments. All directories will exist, be readable/writable, the ports/arguments will be valid, etc.

Your server does not need to support subdirectories. For example, if you invoke:

$ tt-client localhost 9001 upload /var/lib/mydir/myfile.txt

Then you should be able to download that file via:

$ tt-client localhost 9001 download myfile.txt /tmp

And the file will be downloaded and stored in /tmp

Grading

In this project, you will define your RPC service, your RPC APIs, and any RPC data types inside of a Thrift IDL file (called TritonTransfer.thrift). You can use any language you want for your client and server, assuming that Thrift supports the language.

We will carry out a failure test in which we upload a file to TritonTransfer-p2p, then kill one of the three block servers. Your client should be able to download the file, even with one of the block servers out of commission

Project 6 Submission guidelines

You will submit this project to your CSE 124 GitHub account, including your client code, your server code, and your thrift IDL. You should run thrift on your IDL to generate the stubs for your language, and commit those stubs to your repository.

<github_id>_cse124/
|-- project
    |-- proj6
        |-- thrift
            |-- TritonTransfer.thrift
        |-- client
            |-- Makefile
            |-- gen-py (or gen-java, gen-cpp, etc)
            |-- (various source code files)
        |-- server
            |-- Makefile
            |-- gen-py (or gen-java, gen-cpp, etc)
            |-- (various source code files)

We should be able to check out your code, go into the client directory and type 'make' and have your client built. We should then be able to go into the server directory, type 'make', and have your server built.

Note that you may certainly reuse code between projects 5 and 6, but please commit code separately.

Revisions

  1. Mar 4: Clarified that you can't send data blocks via the metadata server.
  2. Feb 25: Initial version