PA 1: Building a web server

2019/01/13

Overview

In this project, you are going to build a simple webserver that implements a subset of the HTTP/1.1 protocol specification called TritonHTTP, defined here.

Project details

Basic web server functionality

At a high level, a web server listens for connections on a socket (bound to a specific address and port on a host machine). Clients connect to this socket and use the TritonHTTP protocol to retrieve files from the server.

Mapping relative URLs to absolute file paths

Clients make requests to files using a Uniform Resource Locator, such as /images/cyrpto/enigma.jpg. One of the key things to keep in mind in building your web server is that the server must translate that relative URL into an absolute filename on the local filesystem. For example, you might decide to keep all the files for your server in ~aturing/cse101/server/www-files/, which we call the document root. When your server gets a request for the above-mentioned enigma.jpg file, it will prepend the document root to the specified file to get an absolute file name of ~aturing/cse101/server/www-files/images/crypto/enigma.jpg. You need to ensure that malformed or malicious URLs cannot “escape” your document root to access other files. For example, if a client submits the URL /images/../../../.ssh/id_dsa, they should not be able to download the ~aturing/.ssh/id_dsa file. If a client uses one or more .. directories in such a way that the server would “escape” the document root, you should return a 404 Not Found error back to the client. Take a look at the realpath() system call for help in dealing with document roots.

Supporting MIME types

Your web server needs to tell the web browser what the type of content it is serving, so that the client can properly render it. For example, HTML files (.html or .htm) need to processed by the client’s markup engine, whereas an image (.jpg, .jpeg, or .png) need to be displayed by an image viewer. The mapping of files to content types is handled by a MIME map, which maps file extensions to content types. The starter code will contain a file with the mappings that you must support. The TritonHTTP spec describes what to do if a web browser requests a file not listed in the provided mime.types file.

Program structure

At a high level, your program will be structured as follows.

Initialize

We will provide you with starter code that handles command-line arguments, reads in a configuration file, and starts your program. Note that the document root, port number, and location of the mime.types file is provided in a configuration file. Please use this configuration file, do not hard code file paths or ports, as we will be testing your code against our own config file. Also do not assume that the files to serve out are in the same directory as the web server. The configuration file will specify paths using either an absolute path or relative path, and may or may not include the final forward slash: e.g., “/var/home/htdocs” and/or “/var/home/htdocs/”, or “../../htdocs/”.

Setup server socket and threading

Create a TCP server socket, and arrange so that a thread is spawned (or thread in a thread pool is retrieved) when a new connection comes in. If you use a thread pool, pre-spawn five threads. Because of how we’re going to build on your webserver for PA 2, please don’t use fork-based concurrency (use threads instead).

Executable

Your server binary should be called httpd and should take one argument which references the configuration file (which may be stored anywhere on the filesystem, so do not assume it is in the current directory).

for example:

$ ./httpd /home/aturing/myconfig.ini
$ ./httpd ~aturing/myconfig.ini
$ ./httpd ../configs/myconfig.ini

Makefile

If you want to add additional files to your project, add the associated object file to the SERVEROBJS line of the Makefile:

SERVEROBJS= server-main.o logger.o HttpdServer.o My-New-File.o

Serving static files

A straight-forward way to serve out static files is to open the file in binary mode, and then enter a loop where you read fixed chunks of the file (e.g. 1 MB) from disk, then send that chunk to the client, until you’ve served out the entire file.

An alternative approach is to use the sendfile() call. Sendfile copies up to N bytes from one open file descriptor to another open file descriptor. Recall the TCP sockets are represented internal to the operating system as a file descriptor. This means that you can open the static file on disk, then call sendfile() so as to copy the contents of that file to the socket.

Logging

Because your web server is going to be multithreaded, it is not a good idea to add printf() statements to your code, since the output of those print statements will be intermixed and hard to read. Instead, I’ve included a thread-safe logging library that emits log entries to stderr. This library is called SPDLOG.

From anywhere in your code, you can get a handle to this logger object via:

auto log = logger();
log->info("This is a simple message");
log->info("This message has an argument: {}", 5);
log->error("Errors are more serious than informational message");

Note that if you include one or more {} characters in the log string, then the call will substitute those braces with the arguments provided.

More information on the logging library is located at its webpage, via https://github.com/gabime/spdlog.

Configuration files

I’ve included the code for the INIReader configuration file handler. It’s a pretty straightforward way to handle a simple config file. As an example:

#include <iostream>
#include "INIReader.h"

int main() {

    INIReader reader("test.ini");

    if (reader.ParseError() != 0) {
        std::cout << "Can't load 'test.ini'\n";
        return 1;
    }
    std::cout << "Config loaded from 'test.ini': version="
              << reader.GetInteger("protocol", "version", -1) << ", name="
              << reader.Get("user", "name", "UNKNOWN") << ", email="
              << reader.Get("user", "email", "UNKNOWN") << ", pi="
              << reader.GetReal("user", "pi", -1) << ", active="
              << reader.GetBoolean("user", "active", true) << "\n";
    return 0;

}

More information (if you need it) is available at its home page, located at https://github.com/jtilly/inih.

Grading

Basic functionality for 200 error code responses (40 pts)

Basic functionality for non-200 error code responses (30 pts):

Concurrency (10 pts):

Pipelining (20 pts):

Autograder

Gradescope will run an autograder with its own htdocs directory filled with text and binary files, including subdirectories. Gradescope will only provide you with a very basic sanity check that your code compiles and starts without crashing: it is your responsibility to ensure that your code precisely follows the TritonHTTP spec. The final autograder will include test cases not included in the version provided to you before the deadline.

Tips

You can use the curl command to generate well-formed requests to your server. If you use the -v argument, you can see the headers that are sent to your server, and your server’s responses:

$ curl -v http://localhost:8080/subdir1/index.html

Starter code

To get a copy of the starter code, please use this invitation

Submitting your work

Log into gradescope.com and upload your code. This assignment is to be done in groups of 1 (solo), 2, or 3 students.

Due date/time

Monday Feb 4, 5pm