Due: 2:00 pm, March 4
In this project you will be working to implement a hardware prefectcher. You will be given a basic cache simulator with an interface to a prefetcher. Your task will be to implement the prefetcher interface with a prefetching algorithm of your own choice. The effectiveness of your prefetcher will be tested against a baseline prefetcher. You will also compete against your fellow classmates for amazing awards and prizes! Notice: The project should be done individually.
Your prefetcher will work in the context of a well-defined memory hierarchy. The memory system is already implemented (in C++), and can be downloaded here: proj1-source.tar.gz. The Data Cache has the following stats:
Your main task in this project will be to implement a prefetcher using the given prefetching interface. The system controller provides information about all loads and stores that are issued by the CPU. The information that is provided includes the effective memory address, PC of the memory instruction, and whether the instruction was a load or a store. You may use this information in any way that you see fit. During all cycles where the CPU is not issuing a request to the L2 cache, the system controller will query the prefetcher for any memory requests that it may have. While your prefetcher may have many requests queued internally, the system will service a maximum of 1 per cycle. After the prefetching request has been satisfied (either from the L2 or main memory), it will be placed in the Data Cache.
The file prefetcher.h pre-defines four functions that must be implemented:
While you are free to examine all parts of the provided memory system, the only modifications you should make is to the prefetching interface contained in the files prefetcher.h and prefetcher.C. The only source code you will be submitting are these two files.
To aid in your understanding of the prefetcher interface, we have provided a sample prefetcher implementation. This simple prefetcher waits for misses on the D-cache and then tries to prefetch the next block in memory. You can download the sample here: sample-pf.tar.gz.
In addition to the constraint that only a single request can be serviced per cycle, you will have one further constraint: the amount of state saved in the prefetcher. The amount of state saved in the prefetcher may not exceed 4KB. Your source code must clearly indicate which variables are used as state. Furthermore, you will need to provide a detailed accounting in your project report of how much state is kept.
The memory hierarchy will be simulated using trace files generated by the Pin binary instrumentation tool. Each line in the trace file refers to a memory access and includes the following four pieces on information:
The memory system provided will output several statistics about the performance of the system. They will help you understand how your prefetcher is performing and why. The stats include:
These stats will be placed in the file mem.trace.out, where mem.trace was the input file used.
Average Memory Access Time will be used for comparisons of your prefetcher to the baseline and your colleagues' prefetchers. For your report, you should test your prefetcher on traces available here: proj1-traces.tar.gz. However, TA will test your prefetcher with another set of memory traces for the prefetching competition. Please consider making your design working for general cases. In addition, if TA cannot reproduce the experimental result in your report, your project will be considered as fail.
You may wish to extend the simulator to collect more stats. This is not required, and your prefetcher should not depend on these modifications.
While you will not be required to generate trace files for this project, you may wish to generate them to more thoroughly test your prefetcher's performance. A Pintool, named memtracer, that will produce trace files of the format required for this project is available here: memtracer2.tar.gz (Thanks to Sat, the 240A TA of 2007 Fall). Pin is a dynamic binary instrumentation tool that is free to use. While you won't be expected to know much about Pin, you can download it as well as find the manual here. You should use the Rev. 23100 release (12/03/2008).
The usage of pin tool is simple. After downloading and untarring Pin, there will be a directory named "Bin" where the pin executable will reside. Although you can place the memtracer pintool from any directory, it is probably easiest if you copy it to that "Bin" directory. From within the "Bin" directory, you can then run the following command:
./pin -t memtracer [-skip s] [-length l] -- /path/to/program
The skip and length field are optional and are used to skip the first s instruction and to run for only l instructions. For example, if you wanted to instrument the "ls" program (which resides in /bin/ls on most Linux systems) but wanted to skip the first 100 instructions and only instrument 500 instructions you would run the command: ./pin -t memtracer -skip 100 -length 500 -- /bin/ls
Please note that you will likely want to use the skip option when generating traces. This will allow you to skip over the loading of shared libraries and other things that are not part of the functionality of the program you are running. Including this startup process in your trace file will skew the behavior of your cache.
After running the memtracer tool, you will be left with a file named "mem.trace" that you can then give to your cache simulator. While Pin can instrument any binary executable file (including Firefox... with some finessing), it adds a lot of overhead so be patient when attempting to instrument large programs. The memtracer tool limits the number of memory accesses that it logs to 2M so you don't need to worry about accidentally filling up your hard disk with the trace file on a large program.
While you are free to use any program that you wish, we suggest the following (which should be available
on almost any Linux system you use): grep, djpeg, cjpeg, ps2pdf, gcc, gunzip, gzip, bzip2, bunzip2, tar, md5sum,
perl, m4, cpp, sort, diff, ppmdither, java, javac, latex, python, uuencode, enscript
You can find out find out more about using these programs by looking at their man files (e.g. man djpeg).
You should feel free to discuss the project with others in the class including sharing detailed performance results of your predictor. Sharing code is expressly forbidden.
Here are some papers to get you thinking about different approaches to prefetching. You are not required to choose an algorithm from these papers, but they can provide some useful starting thoughts. Their bibliographies will provide pointer to other papers on the topic.
Your grade for the project will be based on your write up the prefetcher you implemented. The performance of your prefetcher is less important than your discussion of how the prefetcher works and why it performs the way it does.
Your simulator should be written in either C or C++. It should compile and run on a Red Hat Enterprise Linux machine using gcc (if written in C) or g++ (for C++) version 3.4.6. CSE grad students have access to this type of environment by using one of the computers in the APE lab. More info on accessing the APE lab computers can be found here. For those who cannot access these computers (likely CSE undergrads and ECE students), you can request access by filling out the form located here.
For those of you unfamiliar with developing in the linux environment, you can check out this basic tutorial on compiling using gcc (or g++). For those of you that are rusty with your C++, I would recommend this C++ reference site. Of course, the webboard is also a good place for more specific questions.
The authors of the three top prefetcher (as measured by total run time) will receiver prizes. The prizes will be awarded in class on March 11th.
Your report for project will consist of the following sections:
You only need to submit two source code files: prefetcher.h and prefetcher.C. Your prefetcher should be able to compile with the unmodified memory system that has been provided. The given system is written in C++ so your code should compile with the g++ command on the APE lab computers. This file should contain your name as well as your PID at the top.
Please submit the report and source code file as a tarball via e-mail to Hung-Wei. The e-mail title should be "[CSE240A] Project, your_name , code_name". Your tarball should have the following name: lastname_firstname_cse240a_wi10_project.tar.gzAny submission without following the above constraints will not be graded.
|Due: 2:00 pm, March 4|