CSE 124 Project 2
2017 November 7: Project 2: SurfStore

The link to accept the GitHub invitation is located here: https://classroom.github.com/g/KNN3awwz.

Overview

In this project you are going to create a cloud-based file storage service called SurfStore. SurfStore is a networked file storage application that supports four basic commands:

  • Create a file
  • Read the contents of a file
  • Change the contents of a file
  • Delete a file

Multiple clients can concurrently connect to the SurfStore service to access a common, shared set of files. Clients accessing SurfStore “see” a consistent set of updates to files, but SurfStore does not offer any guarantees about operations across files, meaning that it does not support multi-file transactions (such as atomic move).

The SurfStore service is composed of the following two sub-services:

  • BlockStore: The content of each file in SurfStore is divided up into chunks, or blocks, each of which has a unique identifier. The BlockStore service stores these blocks, and when given an identifier, retrieves and returns the appropriate block.

  • MetadataStore: The MetadataStore service holds the mapping of filenames/paths to blocks.

Additionally, you will need to implement a client that can support the four basic commands listed above.

The project is structured into two parts:

  • Part 1: You’ll implement both the BlockStore and MetadataStore services, and the SurfStore client. The metadata service you implement will simply keep its data in memory, with no replication or fault tolerance. We’ll refer to this version as a centralized implementation of SurfStore.

  • Part 2: Next, you’ll create a version of the MetadataStore service as a set of distributed processes that implement fault tolerance. This distributed implementation will use a repliated log (replicated state machine) plus 2-phase commit to ensure that the MetadataStore service can survive, and continue operating, even if one of its processes fails, and that after failed processes recover they are able to rejoin the distributed system and get up-to-date.

Logistics

  • This project can be done individually or in a group of two.

SurfStore Specification

We now describe the service in more detail.

Basic concepts

Blocks, hashes, and hashlists

A file in SurfStore is broken into an ordered sequence of one or more blocks. Each block is of uniform size (4KB), except for the last block in the file, which may be smaller than 4KB (but must be at least 1 byte large). As an example, consider the following file:

The file ‘MyFile.mp4’ is 14,437 bytes long, and the block size is 4KB. The file is broken into blocks b0, b1, b2, and b3 (which is only 2,149 bytes long). For each block, a hash value is generated using the SHA-256 hash function. So for MyFile.mp4, those hashes will be denoted as [h0, h1, h2, h3] in the same order as the blocks. This set of hash values, in order, represents the file, and is referred to as the hashlist. Note that if you are given a block, you can compute its hash by applying the SHA-256 hash function to the block. This also means that if you change data in a block the hash value will change as a result. To update a file, you change a subset of the bytes in the file, and recompute the hashlist. Depending on the modification, at least one, but perhaps all, of the hash values in the hashlist will change. How the file is modified is outside the scope of SurfStore; you only need to handle updating the file with the new hashlist.

Generating SHA-256 hash values in Java

As an example of converting the string “foobar” to a SHA-256 hash value (encoded using base-64) in Java:

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Base64;

public class ShaTest {

	public static void main(String[] args) {
		if (args.length != 1) {
			System.err.println("Usage: ShaTest <string>");
			System.exit(1);
		}
		
        String text = args[0];
        System.out.println("Input: " + text);

        MessageDigest digest = null;
		try {
			digest = MessageDigest.getInstance("SHA-256");
		} catch (NoSuchAlgorithmException e) {
			e.printStackTrace();
			System.exit(2);
		}
        byte[] hash = digest.digest(text.getBytes(StandardCharsets.UTF_8));
        String encoded = Base64.getEncoder().encodeToString(hash);

        System.out.println("Output: " + encoded);
	}

}

Note that you’ll actually be hashing 4KB binary blocks, and so you will want to simply pass an array of bytes directly to the digest.digest() function.

Generating SHA-256 values in Python

As an example of converting the string “foobar” to a SHA-256 hash value (encoded using base-64) in Python:

import base64
import hashlib

def sha256(s):
    m = hashlib.sha256()
    m.update(str.encode(s))
    return base64.b64encode(m.digest())

print(sha256("foobar"))
$ python testhash.py
b'w6uP8Tcg6K2QR905Rms8iXTlksL6OD1KOWBxTK7wxPI='

Note that you’ll want to simply pass a binary array directly to m.update() in order to calculate the hash value.

Files and filenames

Files in SurfStore are denoted by filenames, which are represented as strings. For example “/Myfile.mp4”, “/Documents/My Videos/BeachVacation.mp4”, and “/Conferences/Expenses.txt” are all example of filenames. Although filenames can contain the slash character (“/”), SurfStore doesn’t really have any concept of a directory or directory heirarchy–filenames are just strings. For this reason, filenames can only be compared for equality or inequality, and there are no “cd” or “mkdir” commands.

File versions

Each file/filename is associated with a version, which is a monotonically increasing non-negative integer. The version is incremented any time the file is created, modified, or deleted. The purpose of the version is so that clients can detect when they have an out-of-date view of the file hierarchy.

For example, imagine that Client 1 wants to update a spreadsheet file that tracks conference room reservations. Ideally, they would perform the following actions:

However, another client might be concurrently modifying this file as well. In reality, the order of operations might be:

As you can see, Client 1 overwrote the change that client 2 made without realizing it. We can solve this problem with file versions. Every time a file is modified, its version number is incremented. SurfStore only records modifications to files if the version is larger than the currently recorded version. Let’s see what would happen in the two-client case:

To delete a file, the MetadataStore service simply notes that the file is deleted. In this way, deletion events also require version numbers, which prevents race conditions that can occur when one client deletes a file concurrently with another client deleting that file. In SurfStore, we are going to represent a deleted file as a file that has a hashlist with a single hash value of “0”. Note that this means the file must be recreated before it can be read by a client again.

Processes

SurfStore consists of three types of processes: client processes, a BlockStore process, and one or more Metadata processes. Note that one (and only one) of the Metadata processes is specially designated as the “leader”, meaning that all client requests should go through that server and that server only. The leader never fails, never crashes, and never loses connectivity with the clients.

Client

A client is a program that interacts with SurfStore. It is used to create, modify, read, and delete files. Your client will call the various file modification/creation/deletion RPC calls. We will be testing your service with our own client.

BlockStore

The BlockStore service is an in-memory data store that stores blocks of data, indexed by the hash value. Thus it is a key-value store. It supports a basic get() and put() operations. It does not need to support deleting blocks of data–we just let unused blocks remain in the store. The BlockStore service only knows about blocks–it doesn’t know anything about how blocks relate to files.

The service implements the following API:

  • StoreBlock(h,b): Stores block b in the key-value store, indexed by hash value h
  • b = GetBlock(h): Retrieves a block indexed by hash value h
  • True/False = HasBlock(h): Signals whether block indexed by h exists in the BlockStore service

MetadataStore

The MetadataStore process maintains the mapping of filenames to hashlists. All metadata is stored in memory, and no database systems or files will be used to maintain the data. When we test your project, we will always start from a “clean slate” in which there are no files in the system.

The service implements the following API:

  • (v,hl) = ReadFile(f): Reads the file with filename f, returning the most up-to-date version number v, and the corresponding hashlist hl. If the file does not exist, v will be 0.
  • ModifyFile(f,v,hl): Modifies file f so that it now contains the contents refered to by the hashlist hl. The version provided, v, must be exactly one larger than the current version that the MetadataStore maintains.
  • DeleteFile(f,v): Deletes file f. Like ModifyFile(), the provided version number v must be one bigger than the most up-date-date version.
  • IsLeader(): Returns true if this Metadata service is the leader, otherwise returns false.

To create a file that has never existed, use the ModifyFile() API call with v set to 1. To create a file that was previously deleted when it was at version v, use ModifyFile with a version number of v+1.

Basic operating theory

When a client wants to create a new file, it first contacts the MetadataStore leader to see if the file already exists (or existed in the past, and has since been deleted). If so, it notes the previous version number, otherwise the file will start with a default version of 0.

The client then reads its local copy of the file and splits it into blocks, as described above. It then computes the hash values of each of the blocks to form a hashlist. It then contacts the MetadataStore leader and invokes the ModifyFile() API, passing it the filename, updated version number, and hashlist.

Clients are also responsible for uploading the blocks of the file to the BlockStore service. A naive implementation would do this after uploading the hashlist via ModifyFile(). But this leads to a potential race condition: what if another client tries to download a file using the hashlist given to ModifyFile() by the first client to the MetadataStore before it’s done uploading all of the blocks to the BlockStore? Moreover, how can SurfStore guarantee that a client actually did upload the necessary blocks for the file, and didn’t crash along the way?

To prevent these issues, the protocol you’re going to use works as follows. When the client does a ModifyFile() operation, the MetadataStore leader is going to query the BlockStore for each of the hash values in the hashlist, to see which, if any, of the blocks are already in the BlockStore. If any blocks are missing from the BlockStore, the MetadataStore will reply back to the client with a list of missing blocks. The MetadataStore will not create the filename to hashlist mapping if any blocks are not present in the BlockStore. Only when all the blocks are in the BlockStore will the MetadataStore signal a success return value to the client’s ModifyFile() operation, and from then on the new file version is available to any clients that want to download it.

As an example:

To download a file, the client invokes the ReadFile() API call on the MetadataStore, passing in the filename. The MetadataStore simply returns the version and hashlist to the client. The client then downloads the blocks from the BlockStore to form the complete file. As an example:

Optimizing the SurfStore client

There are two ways to optimize the upload/download protocols described above in the client, which we require you to implement for this project.

Optimizing client downloads

Many files in cloud storage systems have regions that are exactly the same across different files. For example, two video files may have overlapping regions that are exactly the same, or a user might have two copies of the same mp3 song file with different names. The client must make sure to only transfer blocks that it doesn’t already have from the BlockStore. To implement this functionality, you are going to have the client scan all the files in the directory specified on the command line when it starts up. The client will build the list of blocks of each file in the directory, and reference this before uploading or downloading files from the service.

In particular, if you download two files with identical contents (but different names), the client should only download the blocks of the file the first time, not both times.

Optimizing client uploads

Similar to the above, some files may have the some block more than once in its hashlist, or share a block with a different file. You must ensure that your client only uploads each block once to the BlockStore, and also does not upload any blocks that were present before the client started.

Distributed SurfStore

In the second part of this project, you are going to replicate the MetadataStore service to make it fault tolerant. There will be a single leader that all read, delete, and modify operations will go through, as before. That leader never fails and never goes off the network. The leader will turn client requests into replicated state machine operations that will be distributed to the other replicas using the 2-phase commit protocol. For this project, there will be one leader and two additional replicas for a total of three Metadata servers.

Initiating failures

In a real system, processes might fail for a number of reasons, including crashes, disk or memory failures, network partitions, software bugs, etc. In general, such failures are highly non-deterministic.

To aid your testing of your own implementation, and to aid our testing and grading of your project, we are going to extend the MetadataService API with calls that enable us to manually “fail” and “recover” processes. In this way, we can explore a number of failure scenarios.

Thus, each MetadataStore instance will need to support these APIs:

  • Crash(): This API call signals to the process that it should enter an emulated failure state. A failed process should not instigate any new messages, and should reply to any incoming messages with a failure response.
  • Recover(): This API call signals to the process that it has “recovered” and is now back online. It can now process and instigate RPC calls, and must begin to “catch up” so that it has the most up-to-date information about the current state of the Metadata service state.

Note that it is not valid to call Crash() or Recover() on the leader.

Implementation details

Language restrictions

For Project 2, you must complete all parts in either Python or Java.

Configuration file

For this project, you will use a configuration file describing the cluster details, with the following format:

configCentralized.txt

    M: 1
    L: 1
    metadata1: <port>
    block: <port>

configDistributed.txt

    M: 3
    L: 2
    metadata1: <port>
    metadata2: <port>
    metadata3: <port>
    block: <port>
  • The initial line M defines the number of Metadata servers.

  • The second line L denotes which of the Metadata servers is the leader. In the centralized example, metadata1 is the leader. In the distributed example, metadata2 is the leader.

  • The ‘metadata1’ line specifies the port number of your metadata server. Note the ‘1’, ‘2’, etc after the word metadata to indicate the ports for the different instances of the service.

  • ‘block’ denotes the port number of your BlockStore.

This config file will be available to the client and servers when they are started. This configuration file helps the server or client know the cluster information and also how many metadata servers are present in the service. Note that because you’re going to run the client, the BlockStore, and the Metadata server all on the same machine, you will need to use unique ports. The configuration file we provide will always be valid and will not contain any errors or problems.

Client

When a client boots up, it is provided the config file and a base directory as a command line argument.

  • The base directory may be empty or already have some files in it. The client should scan the directory for files when it starts and process the blocks of each file, storing the list of blocks already present at the client. This is important for the client optimizations described above.

  • The client will perform one or more operations before exiting.

SurfStore gRPC API

For this project, you will be using gRPC to implement the SurfStore API. We have provided you with a SurfStoreBasic.proto file that defines the basic API calls outlined above. You may add new services, RPCs, and Message types to this file, but do not delete or modify any of the existing RPCs or Messages.

  • gRPC is an RPC framework that will automatically generate stub code for API calls you specify. It’s up to you to implement them. We recommend looking at the gRPC sample included with the project in order to learn how to use the generated stub code.

  • You’ll need to implement more RPCs in part 2 of the project yourself in order to implement replicated state machines and 2-phase commit.

  • Be careful to not make any assumptions or shortcuts when using the API. Your implementation should work with any individual component (MetadataStore, BlockStore, and Client) swapped out with another student’s version. Feel free to try this out by logging into the same ieng6 machine and using the same configuration file! Of course, we don’t expect different implementations of MetadataStore to work together in part 2, since the API for that is completely up to you.

  • The provided starter code has separate “start” scripts for each component (e.g., the BlockStore, Metadata1, Metadata2, etc). Make sure that we can start these component individually. In other words, please do not have your code start all services in a single python or java program, since we’d like to try different combinations of servers. For example, we might want to test your client and your Metadata servers with our BlockStore. Or we might try your Metadata servers and Blockstore with our client.

A link to the starter code and the GitHub invitation will be made available shortly.