PA 1 framing and parsing tips

2019/01/21

To get started on your web server, you’re going to need to have your server read in one (or more than one) request(s). But how to do that?

Consider a simplified version of this problem: you’ve got an unknown number of HTTP requests, each of which is variable size, separated by the four-byte delimiter <CR><LF><CR><LF>. Note that in C++, these bytes are specified as "\r\n\r\n". However, all you have is a recv() call which returns one or more bytes.

The problem

Imagine the client sends four requests:

Request1\r\n\r\nRequest2\r\n\r\nRequest3\r\n\r\nRequest4

Here are a few scenarios that make this problem non-trivial:

The solution: framing

One simple way to separate messages based on the delimiter is to (1) read bytes from the socket into (2) a dynamicly resizing array/buffer, and then (3) look for request(s) in that buffer.

So in pseudocode that would look something like:

B = ByteBuffer();

loop forever:
  read available data from socket;
  if the client is done, close connection and exit;
  else:
	  B.append(those bytes you just read)

	  while(B has at least one full TritonHTTP message in it):
		 M = retrieve and remove a message from B
		 response = process(M)
		 send response back to the client

Implications

Code similar to the above should handle the cases listed at the top of this post. One question to ask yourself is how to determine if the buffer has at least one full request? What are you looking for in that code?

This implementation involves making a second copy of the incoming request (once into the buffer, then again to process the message). That isn’t ideal in terms of performance, but it is a great way to get started in the world of handling network protocols.

Testing your code

I ~highly~ recommend separating the code that interfaces with network sockets from code that actually implements your webserver. For example, in the above pseudocode, the process(M) method/function takes a string (which is the request) and returns a string (which is the response, though this response might also have binary data in it).

Notice that with this approach, you can entirely test the process(M) method/function without actually interfacing with the network. In fact, you could develop a set of unit tests that feed different pre-determined requests into the process method to test that code separately from your network code.