cse240a Project 1: Cache Simulator

Part 1: Building a cache simulator

Due: Noon, October 30

Introduction:

For this project, you will be implementing a basic cache simulator in C/C++. It will take in several parameters that describe the desired cache (e.g. size, associativity, etc) along with a trace file describing data memory accesses for a particular program. The simulator will output several statistics (e.g. hit rate, total running time of the program, etc).

Configuration File:

The parameter your cache simulator will is one specifying a configuration file that contains all the necessary parameters to setup your cache and to calculate the stats that need to be output. The layout of the configuration file has a specific format that must be adhered to. It contains 6 lines with the following order:

Line size: Specifies the line (AKA block) size for the cache in bytes. This should always be a non-negative power of 2 (i.e. 1,2,4,8,etc).
Associativity: Specifies the associativity of the cache. A value of "1" implies a direct-mapped cache, while a "0" value implies fully-associative. Should always be a non-negative power of 2.
Data size: Specifies the total size of the data in the cache. This does not include the size of any overhead (such as tag size). It should be specified in KB and be a non-negative power of 2. For example, a value of "64" means a 64KB cache.
Replacement Policy: Specifies the replacement policy to be used. Should be either "0" for random replacement or "1" for LRU. No other values are valid.
Miss penalty: Specifies the number of cycles penalized on a cache miss. May be any positive integer.
Write allocate: Specifies the cache policy on store misses. A value of "0" means write-around. A value of "1" means write-allocate. Any other value is invalid.

An example config file is provided here. It specifies a 16KB direct-mapped cache with 8 byte blocks, an LRU replacement policy, 100 cycle miss penalty, and following a write-allocate policy.

NOTE: The office configs that you need to use for your report can be downloaded here. You will need to include the results from all these configs in your report.

Trace File:

Your simulator will need to take a second command-line parameter that specifies the trace file that will be used to compute the output statistics. The trace file will specify all the data memory accesses that occur in the sample program. Each line in the trace file will specify a new memory reference. Each line in the trace cache will therefore have the following three fields:

Access Type: A single character indicating whether the access is a load ('l') or a store ('s').
Address: A 32-bit integer (in unsigned hexidecimal format) specifying the memory address that is being accessed. For example, "0xff32e100" specifies that memory address 4281524480 is accessed.
Instructions since last memory access: Indicates the number of instructions of any type that executed between since the last memory access (i.e. the one on the previous line in the trace). For example, if the 5th and 10th instructions in the program's execution are loads, and there are no memory operations between them, then the trace line for with the second load has "4" for this field.

Fields on the same line are separated by a single space. The trace files that you will need to use for your turnin can be found here.

Simulator Output:

Your simulator should write to an output file file the ".out" extension, whose prefix is based on the name of the input trace file. For example, if you input "gcc.trace" then the output file should be named "gcc.trace.out". Your simulator will be calculating several statistics that will go into the output file. Much like the configuration file, the output file should contain one field per line. It should contain the following 5 lines:

Total Hit Rate: The percentage of memory ops (i.e. lines in the trace file) that were hits. This number should be truncated to 4 decimal places of precision.
Load Hit Rate: The percentage of loads that were hits. Set to 4 decimal places of precision.
Store Hit Rate: The percentage of stores that were hits.
Total Run Time: The total run time of the program, assuming that the last memory access was the last instruction of the program. The unit for this should be cycles.
Average Memory Access Latency: The average number of cycles needed to complete a memory access. This number should be to 4 decimal places of precision.

Compilation and Execution Environment:

Your simulator should be written in either C or C++. It should compile and run on a Red Hat Enterprise Linux machine using gcc (if written in C) or g++ (for C++) version 3.4.6. CSE grad students have access to this type of environment by using one of the computers in the APE lab. More info on accessing the APE lab computers can be found here. For those who cannot access these computers (likely CSE undergrads and ECE students), you can request access by filling out the form located here.

For those of you unfamiliar with developing in the linux environment, you can check out this basic tutorial on compiling using gcc (or g++). For those of you that are rusty with your C++, I would recommend this C++ reference site. Of course, the webboard is also a good place for more specific questions.

Deliverable

Report, Part 1:

For the first part of this project, you will be evaluating several different cache configurations (found here) and their performance on the given trace files. You will need to generate graphs comparing the performance of each cache configuration on the given trace files. Specifically you will need to generate the following 2 graphs:

Total Hit Rate: A bar chart showing the total hit rate for each configuration on each input trace.
Average Memory Access Latency: A bar chart comparing this statistic on each cache configuration for each if the trace files

A note on graphs: Graphs are supposed to make large amounts of information comprehensible very quickly. This means they must clear, uncluttered, easy to decipher, and well labeled. Perhaps the most important and most frequently broken rule for graphs is "Label your axes." Your graphs MUST have properly labeled axes, or they will receive no credit.

These two graphs should be placed in a PDF file named "lastname-firstname-part1_results.pdf", where the lastname and firstname are... well... you know. For example, if Sat were to turn it in, the file would be named "Garcia-Sat-part1_results.pdf".

Executable and Source Code:

You should also turn in the source code and an executable for your cache simulator. See the note above about the compilation and execution environment. If you have any special compilation/execution notes, please include them in a readme. Also, remember that your simulator should take only two parameters: the config file and the trace file (in that order). Make sure that all your source code files have your name in them (at the top) along with your UCSD PID and e-mail address. Sat would also appreciate it if you included a credit card number so he can buy something nice for his wife *wink*.

Tarball:

Please place your report, executable, and source code in a gzipped tarball using the following naming convention: "lastname-firstname-cse240a-fa07-proj1-part1.tar.gz" where the lastname/firstname bit is the same as above. E-mail this file to Sat on or before noon on the due date.

Due: Noon, October 30

Part 2: Is Something Wrong with Patterson's Thumb?

Due: Noon, October 30

In your textbook, it is said that as a rule-of-thumb halving the size of a cache while doubling the associativity will result in similar performance. Yet, even within the textbook there are graphs that seem to show this to be false. Your job, as a cache simulation expert will be to test whether this rule-of-thumb is true. Along with the traces given in part 1, you will be generating your own traces from real-world programs to test this hypothesis.

Note: This project requires some knowledge of Unix commands. You may freely discuss issues with using tar and the pin tool on the discussion board, including sharing the specific command lines that get things done. You must generate your own trace files, however, and the analysis of your data must be your work alone. For those of you who use Windows, you may remotely login to the Linux APE machines using a free program named PuTTY. You can download PuTTY here.

Generating Traces with Pin

As part of your assignment, you will need to generate your own memory trace files. Luckily, there is a handy tool called Pin that is perfectly suited to handle this task. Pin is a dynamic binary instrumentation tool that is free to use. While you won't be expected to know much about Pin, you can download it as well as find the manual here. You should use the Rev. 14297 release (9/6/2007). If you are using the APE computers, you should download gcc 3.4 version, located here.

In order to get you started, Sat has written a pintool that will generate trace files that use the same format that your cache simulator accepts as input. The pintool, called memtracer, is available here. Usage is simple. After downloading and untarring Pin, there will be a directory named "Bin" where the pin executable will reside. Although you can place the memtracer pintool from any directory, it is probably easiest if you copy it to that "Bin" directory. From within the "Bin" directory, you can then run the following command:

./pin -t memtracer [-skip s] [-length l] -- /path/to/program

The skip and length field are optional and are used to skip the first s instruction and to run for only l instructions. For example, if you wanted to instrument the "ls" program (which resides in /bin/ls on most Linux systems) but wanted to skip the first 100 instructions and only instrument 500 instructions you would run the command: ./pin -t memtracer -skip 100 -length 500 -- /bin/ls

Please note that you will likely want to use the skip option when generating traces. This will allow you to skip over the loading of shared libraries and other things that are not part of the functionality of the program you are running. Including this startup process in your trace file will skew the behavior of your cache.

After running the memtracer tool, you will be left with a file named "mem.trace" that you can then give to your cache simulator. While Pin can instrument any binary executable file (including Firefox... with some finessing), it adds a lot of overhead so be patient when attempting to instrument large programs. The memtracer tool limits the number of memory accesses that it logs to 2M so you don't need to worry about accidentally filling up your hard disk with the trace file on a large program.

While you are free to use any program that you wish, we suggest the following (which should be available on almost any Linux system you use): grep, djpeg, cjpeg, ps2pdf, gcc, gunzip, gzip, bzip2, bunzip2, tar, md5sum, perl, m4, cpp, sort, diff, ppmdither, java, javac, latex, python, uuencode, enscript
You can find out find out more about using these programs by looking at their man files (e.g. man djpeg).

Deliverable

Report, Part 2

For this part of the project, you will be reporting and analyzing the results from your testing of Patterson's rule-of-thumb. In terms of reporting the results, you will be required to produce one graph that will detail the results on the trace files from part 1 and your own Pin-generated trace file. You will be required to generate one trace files on your own based one one program you select. Your graphs will be bar charts comparing the total hit rate for multiple cache configurations. While there is no list of configurations you should use, you should design your set of configurations such that the results are enough to sufficiently prove or disprove the hypothesis. Having too few or choosing an unconvincing set of configurations will not garner full credit. On the other hand, reporting too many results may also be confusing and result in the loss of points. Do not forget to label your graphs if you want credit.

As well as reporting the results via graph, you should provide some analysis of the results. The analysis should intepret the results and offer some explanation of why the rule-of-thumb did or did not hold. You should limit the length of the analysis to roughly two paragraphs (no more than one page).

Due: Noon, October 30