The goal of this project is to build a simple web client that can send requests (and process responses) from an HTTP web server. What differentiates your client from existing ones (such as curl or wget) is that your client is going to check to make sure the HTTP protocol is strictly enforced. In other words, your client will test the server to make sure it is implementing HTTP correctly. This tool will come in handy when you build your own server during project 2.
General learning objectives:
Specific learning objectives:
This project should be done individually. The due date is listed on the course syllabus.
At a high level, a web client connects to a server socket on a web server, and uses a simple text-based protocol to retrieve files or other information from the server. For example, you might try the following command from a UNIX machine:
$ telnet www.ucsd.edu 80
GET /index.html HTTP/1.1\n
Host: www.ucsd.edu\n
\n
(type two carriage returns after the "Host" header). This will return to you (on the command line) the html representing the "front page" of the UCSD web page:
HTTP/1.1 200 OK Date: Thu, 07 Jan 2016 04:09:11 GMT Server: Apache/2 Last-Modified: Wed, 06 Jan 2016 18:00:13 GMT ETag: "6a46-528ae20302540" Accept-Ranges: bytes Content-Length: 27206 Content-Type: text/html; charset=UTF-8 <!DOCTYPE html> <html lang="en"> (rest of html web page follows...)
For this assignment, you will need to support a (pretty small) subset of the HTTP 1.1 protocol to interact with existing web servers. Your client will need to be able to request HTML files as well as in-line images (jpg and png).
At a high level, your web client will be structured something like the following:
Initialize:
Take an HTTP URL as a command-line argument,
including the TCP port number (80 is assumed if not provided).
$ ./http-client http://www.ucsd.edu:80/index.html
Connect and send:
Create a TCP connection to the web server and issue a well-formed HTTP
request for the appropriate content.
Receive:
Receive the HTTP response headers and content (e.g., HTML page or image).
Write the contents to a local file with the appropriate name (e.g.,
index.html or foo.jpg)
Close:
When all the content is requested and received, close the connection.
Error reporting:
Your program should print a '0' to stdout if the web server followed
the HTTP spec correctly, or else your program should print
a positive, non-zero error code that signals the type of error
that you found (see below). Your program may print whatever
messages you want to stderr (we won't check those).
Your client does not need to support concurrency or multithreading. You may choose from C or C++ to build your web client but you must do it in a Unix-like environment with the sockets API we've been using in class (e.g., no HTTP libraries).
Your client must support:
Your client program will only need to support this subset of HTTP/1.1.
There are a variety of resources online that can be of help:
The course staff will provide 10 instances of a web server, each on a different TCP port (details to be provided shortly). For example, we may run server instances on ports 8000, 8001, 8002, ..., 8009. You are to run your program against each port and determine whether the web server instance running on that port (1) implements the HTTP spec correctly, or (2) does not implement the HTTP correctly. In the latter case, you must figure out what the error condition is. We will 'label' these ports with the associated errors so that you can check that your solution works (for these 10 cases at least). When we grade your submission, we will test against separate ports that you will not have access to.
You may request the following documents from the server. We are providing these for you so that you can check to make sure that what the server returns is actually the correct content.
These files are available here.
The possible error conditions (and their associated numerical codes to be printed to stdout) are:
Code number | Code | Description |
---|---|---|
0 | OK | The server works correctly |
1 | Bad_socket | The server socket is not setup or not accepting connections. |
2 | Premature_close | The server closed the connection before processing the full request(s). |
3 | Bad_server_status | The server's status line is malformed, or returns a response code not indicated in the above project spec. |
4 | Bad_response_headers | The response headers are malformed. |
5 | Bad_response_body | There isn't a response body, or the blank line between the response headers and the body isn't there, or the response body does not have the correct data in it. For example, not all of the HTML page or image was returned, or some of the content was corrupted and doesn't match the reference file. |
6 | Wrong_content_length | The Content-length field indicates a length that is not correct. |
7 | Wrong_content_type | The Content-Type field doesn't match the content. |
Port | Code | Description |
---|---|---|
8000 | 0 | OK |
8001 | 1 | Bad_socket |
8002 | 2 | Premature_close |
8003 | 3 | Bad_server_status |
8004 | 4 | Bad_response_headers |
8005 | 5 | Bad_response_body |
8006 | 6 | Wrong_content_length |
8007 | 7 | Wrong_content_type |
8008 | 0 | OK |
8009 | 0 | OK |
You will submit your first project to your CSE 124 GitHub account. You should include a Makefile that will build your project, producing a binary executable should be called 'http-client'. You don't have to commit your binary, just the Makefile and associated source files.
<github_id>_cse124/ |-- project |-- proj1 |-- http-client |-- Makefile 2 directories, 2 files