Assignment 3 100pts¶
Deadline: Wednesday, February 5, 2020 by 13:59:59
Early Turn-In: Sunday, February 2, 2020 by 23:59:59 (10% bonus)
The goal of this assignment is to gain hands-on experience exploiting side channels. This assignment will demonstrate how side channels can be just as dangerous as the control flow vulnerabilities you exploited in Assignment 2.
You will be provided a two skeleton files (
timehack.c) that you will use to exploit side channels present in a target library (
sysapp.c) to obtain a protected key. You may consult any online references you wish. If you use any code in your answer that you did not write yourselves, you must document that fact. Failure to do so will be considered a violation of the academic integrity policy.
Just like previous assignments, for this assignment you will be provided with a VirtualBox VM containing all the files you need. The VM is named
pa3box, and you can download the VM image here. As in previous assignments, import the
pa3box.vbox file into VirtualBox via the
Add menu item; don't create a brand-new VirtualBox VM or import the
pa3box-data.vmdk file on its own: if you do this, you won't have the correct configuration.
The VM is configured with two users:
student, with password
root, with password
hackallthethings. We recommend you use the VM via
scp, just like previous assignments. Please note that SSH is disabled for
root, so you can only SSH in as the
student user. You can still log in as
su or by logging into the VM directly.
Starter files are included inside the
student user's home directory within the VM image.
There are two parts to this assignment, each with its own subdirectory,
timehack. Each subdirectory contains exploit starter code (
timehack.c), which imports a library (
sysapp.c, with the same contents across the two subdirectories) with password-checking functionality vulnerable to side-channel attacks. You should not modify
timehack.c. Each subdirectory contains a
Makefile for building your exploits.
You will be writing two exploits, each of which takes advantage of a side channel, to obtain the password in
memhack.c, you will exploit a memory-based side channel, and in
timehack.c you will exploit a timing-based side channel. See Section 2 for specific details. Once both of your exploits can determine the password in
sysapp.c and call the
hack_system() function successfully, the assignment is complete. Additionally, for each exploit, provide a brief description of how it works in its corresponding
Submitting Your Solutions¶
timehack.c to the corresponding assignments on Gradescope, and submit a combined writeup to the writeup assignment on Gradescope.
Memory-Based Side Channel¶
We recommend you start with the memory-based side channel because it is deterministic and doesn't have problems with noise. Look at the
check_pass() function in
sysapp.c and note two things:
- The password string is passed by reference
- The memory it points to is checked against the reference password one character at a time.
Now look in
memhack.c and note how a buffer is allocated such that the page starting at page start is protected (i.e., accessing it will caused a segmentation fault, or SEGV) and the previous 32 characters are allocated. Now look at the demonstration function
demonstrate_signals(), which shows how referencing any memory in the protected page will produce a fault as well as how to catch that fault in your program. You do not need to use this function; it is merely there to show you how to use signals to capture whether a memory reference touched a page or not.
Now you will want to create a framework (in
memhack.c) to call
check_pass() with different inputs and catching any resulting faults so you can determine if part of the password is correct. We suggest a loop over the maximum password size (32 characters) where for the first guess you store the password such that its first character is one byte before page start and then iterate between possible choices for the first character (when you get it right you will get a page fault). Repeat this to guess the entire password. Note that all ASCII symbols from ASCII 33 (
!) to ASCII 126 (
∼) can be used in the password. Other hints:
- You are already given a page protected buffer with enough memory to crack the password. All you need to do is use it appropriately for each guess you make.
demonstrate_signalsfunction handles all the segfaults for you. You can re-use almost all of it in your code.
Timing-Based Side Channel¶
Unlike the memory-based side channel, the timing-based side channel will deal with noise. Go back and look at
check_pass(). An artificial delay has been added when each character is checked to make your life easier (it’s possible to do the assignment without it but it would require a much more careful methodology). The execution time of
check_pass() depends on how many characters you guess correctly.
timehack.c and find a macro there for a function called
rdtsc() which invokes the processors cycle counter (a clock that increments by one for each processor cycle that passes). In general, treat
rdtsc() as a free running timer that returns a
long. Insert a call to
rdtsc() before the call to
check_pass() and afterwards. Print the difference between these values to see how long (in cycles) the password checking ran. Run the program a few times. Now change the guess string so the first character is correct. Run again and see how the time difference changes.
Now automate this entire process, in the style of the original approach in
memhack.c. Note that unlike the
memhack attack, the
timehack problem will have to deal with noise. Depending on things like what other programs are running, the status of the cache, the contents of the branch target buffer, etc... there can be significant variability in the amount of time each check takes. This will matter in practice. You will want to run a lot of trials before you reach your conclusion about each character. Other hints:
- Be careful in using
printf’s. These can blow out the instruction cache and data caches and perturb your results (i.e. overwhelm the timing effects you are trying to detect).
- Be careful in averaging across trials. If your process is descheduled in the middle of a measurement, the time cost of that individual trial will be so large that it overwhelms everything else. Instead, the median is your friend. However, feel free to experiment if something does not work for you.
- If time is not continuing to increase as you progress through characters, then you probably made a bad guess earlier. Backtrack.
rdtsc()will wrap around at some point. You may need to handle this outlier if it is causing issues.
- Debugging advice: make a big array to hold your timing measurements and print them at the end.
- Be sure to test a bunch of different passwords. We will when we grade.
Do not write a solution that simply checks all passwords exhaustively. You will not get credit for this. The timing side-channel should be exploitable in linear time (we will stop programs that are running for excessive periods) and it will basically feel instantaneous for passwords of 8 characters or less (note we will not test passwords over 12 characters).