Homework Policy
- Homeworks are due at 12pm on the due date unless otherwise noted.
- Turn in your printed solutions to Sat's mailbox in the CSE grad student mailroom, unless otherwise noted.
- Late assignments will not be accepted.
- There is no regrading of written homeworks, except for addition errors. No single problem will have a significant impact on your grade.
- You may work in groups of one or two for written homeworks. Each group of two submits one writeup, and both people receive the same grade.
- Programming projects are to be done individually. You may discuss the projects with other people, but you may not look at other people's code.
- Typically, homework assignments may be graded based on a statistical subset of the problems in each assignment.
Integrity Policy
- Cheating will not be tolerated and will be dealt with accordingly.
- Please review the UCSD student handbook for more details on Academic Integrity.
- We photocopy a random sampling of the exams in order to ensure that students do not modify their tests after they have been returned.
Assignments
Assignment 1: Log into and use the class discussion board
This assignment should be completed on your own.
Part 1: Log into the discussion boardDue: October 2
There is a link to the coures discussion board on the course homepage.
Your login is your official UCSD user name (i.e., your @ucsd.edu email
address without the "@ucsd.edu" part). Password is your PID.
Take some time to explore the discussion boards features. If you
like, you can configure it to send everything to you by email, so you
don't have to visit the website regularly.
Need an account? If you are
enrolled through concurrent enrollment or, for some other reason, you
need to take proof enrollment to the ACS help desk (AP&M 1313) to
get an account. If you have trouble, send me mail.
Deliverable |
Post a reply under the "Welcome" message in the administrative forum.
You don't neet to hand anything in.
|
Due: October 2 |
Part 2: Participate in discussion on the web-board during the courseDue:
Throughout the quarter
Participation in the discussion board is mandatory. It forms a
significant fraction of your class participation grade.
If you have questions about the material in the class, post them to
the general discussion forum. Sat and I will hold off on answering
the questions for a while to give other student an opportunity to
reply.
Posting Guidelines
-
Be nice!!! There are no dumb questions, but there are plenty rude,
disrepectful, and unconstructive answers. They will not be
tolerated.
-
Use desciptive subject lines so everyone can tell what your message is about.
-
Anything related to architecture is fair game in the discussion
forum, even if we have not discussed it in class.
-
Direct discussion of homeworks or projects is not allowed!
Obviously, you may have questions related to topics covered in the
homework. You can ask these questions, but pose them more generally.
Likewise, do not post partial or full solutions to homework
problems.
-
Questions and clarifications about the homework problems should
go to the Admin forum. Sat and I will answer these questions
post-haste.
Deliverable |
You should be posting to the discussion board about architecture-related
topics at least once or twice a week throughout the quarter. The more
you post, the more useful the forum will be. You don't need to hand
anything else in.
|
Due:
Throughout the quarter
|
Assignment 2: Caching
You are encouraged to work on this assignment in groups of two. If
you do work in a group of 2, submit only 1 report (but make sure both
names are on it). While you may have your partner do all the work, this
will only hurt you when the midterm and final come around so don't do
it.
Book ProblemsDue: October 16
The following problems are required. They are taken from the 3rd edition of the book (since
all the 4th edition solutions are available on the textbook website.
- You are building a system around a processor with in-order execution that runs at 1.1GHz
and has a CPI of 0.7 excluding memory accesses. The only instructions that read or write
data from memory are loads (20% of all instructions) stores (5% of all instructions).
The memory system for this computer is composed of a split L1 cache that imposes no penalty
on hits. Both the I-cache and D-cache are direct mapped and hold 32KB each. The I-cache has
2% miss rate and 32-byte blocks, and the D-cache is write through with a 5% miss rate and
16-byte blocks. There is a write buffer on the D-cache that eliminates stalls for 95% of
all writes.
The 512KB write-back, unified L2 cache has 64-byte blocks and an access time of 15ns. It
is connected to the L1 cache by a 128-bit data bus that runs at 266 MHz and can transfer one
128-bit word per bus cycle. Of all memory references sent to the L2 cache in this system,
80% are satisfied without going to main memory. Also 50% of all blocks replaced are dirty.
The 128-bit-wide main memory has an access latency of 60ns, after which any number of
bus words may be transferred at the rate of one per cycles on the 128-bit-wide 133 MHz main
memory bus.
- What is the overall CPI, including memory accesses?
- You are considering replacing the 1.1GHz CPU with one that runs at 2.1GHz, but is
otherwise identical. How much faster does the system run with a faster processor?
Assume the L1 cache still has no hit penalty, and that the speed of the L2 cache, main
memory, and buses remains the same in absolute terms (e.g. the L2 cache still has a 15
ns access time and a 266MHz bus connecting it to the CPU and L1 cache).
- A cache may use a write buffer to reduce write latency and a victim cache to hold
recently evicted (nondirty) blocks. Would there be any advantage to combining the two intoo
a single piece of hardware? Would there be any disadvantages?
- Smith and Goodman found that for a given small size, a direct-mapped instruction cache
consistently outperformed a fully associative instruction cache using LRU replacement.
- Explain how this would be possible. (Hint: You can't explain this using
the three C's (compulsory, capacity, conflict) model of cache misses because it
"ignores" the cache policy.)
- Explain where replacement policy fits into the three C's model, and explain why
this means that misses caused by replacement policy are "ignored"--or, more precisely,
cannot in general be definitively classified--by the three C's model.
- Are there any replacement policies for the fully-associative cache that would
outperform the direct-mapped cache? Ignore the policy of "do what a direct mapped
cache would do."
- Way prediction allows an associative cache to provide the hit time of a direct-mapped
cache. The MIPS R10K processor uses way prediction to achieve a different goal: reduce the
cost of a chip package.
The R10K hardware includes an on-chip L1 cache, an on-chip L2 tag comparison circuitry,
and an on-chip L2 way prediction table. L2 tag information is brought on chip to detect
an L2 hit or miss. The way prediction tables contains 8K 1-bit entries, each corresponding
to two L2 cache blocks. L2 cache storage is built externally to the processor package, must
be two-way associative, and may have one of several block sizes.
- How can way predictoin reduce the number of pins needed on the R10K package to read
L2 tags and data, and what is the impact on performance compared to a package with a
full complement of pins to interface to the L2 cache?
- What is the performance drawback of just using the same smaller number of pins but
not including way prediction?
- If a 512 KB L2 cache has 64-byte blocks, how many way prediction entries are needed?
How would the R10K support this need?
- For a 4 MB L2 cache with 128-byte blocks, how is the usefulness of the R10K way
prediction table analogous to that of a branch history table?
Deliverable |
The solutions to the above problems either typed-up (highly recommended) or nicely
hand-written. Include your name (or names, if working in a group), on all pages.
|
Due: October 16 |
Assignment 3: Virtual Memory and Basic Pipelining
While this homework is not due until after the midterm, you should attempt to complete it before
then as some of these topics (namely, virtual memory) will be covered on that exam.
Problems 2 - 4 come from the 3rd edition of our class textbook.
ProblemsDue: November 13
- You are interested in calculating the access time of a 128KB, 4-way set associative
write-back cache with 32 byte blocks. As you learned in class, there are various ways to
index and tag your cache in relation to your virtual memory system. Consider that you
have the following information about your system:
- Accessing the TLB: 5ns
- Indexing the cache to access its data portion: 10ns
- Indexing the tag array of the data cache: 8ns
- Tag comparisons: 4ns
- Multiplexing the output data: 2ns
Assume that all other parts are insignificant to the access time of the cache.
For the following configurations, calculate the amount of time it takes to get data
from the cache on a load hit. Assume that your page size is 4KB. Be sure to explain your
answers; just writing down numbers won't get you credit even if they end up being
correct.
- A physically-indexed, physically-tagged cache
- A virtually-indexed, virtually-tagged cache
- A virtually-indexed, physically-tagged cache
- A physically-indexed, virtually-tagged cache
- Use the following code fragment:
Loop: |
LD |
R1,0(R2) |
|
DADDI |
R1,R1,#1 |
|
SD |
0(R2),R1 |
|
DADDI |
R2,R2,#4 |
|
DSUB |
R4,R3,R2 |
|
BNEZ |
R4,Loop |
Assume that the initial value of R3 is R2 + 196.
Throughout this exercise use the classic RISC five-stage integer pipeline and assume all
memory accesses take 1 cycle.
-
Show the timing of this instruction sequence for the RISC pipeline without any
forwarding or bypassing hardware but assuming a register read and a write in the same
cycle "forward" through the register file, as in Figure A.6. Use a diagram like the one
shown in Figure A.33. Assume that the branch is handled by flushing the pipeline. If
all memory references take 1 cycle, how many cycles does this loop take to execute?
-
Show the timing of this instruction sequence for the RISC pipeline with normal forwarding
and bypassing hardware. Again, use a diagram like in Figure A.33. Assume that the branch
is handled by predicting it not taken. If all memory references take 1 cycle, how many
cycles does this loop take to terminate?
-
Asume the RISC pipeline with a single-cycle delayed branch and normal forwarding and
bypassing hardware. Schedule the instructions in the loop including the branch delay
slot. You may reorder instructions and modify the individual operands, but do not
undertake other loop transformations that change the number or opcode of the instructions
in the loop. Show a pipeline timing diagram (like Figure A.33) and compute the number
of cycles needed to execute the entire loop.
-
For these problems, we will explore a pipeline for a register-memory architecture. This
architecture has two instruction formats: a register-register format and a register-memory
format. There is a single memory addressing mode (offset + base register).
There is a set of ALU operations with the format:
ALUop Rdest, Rsrc1, Rsrc2
or
ALUop Rdest, Rsrc1, MEM
where the ALUop is one of the following: Add, Subtract, And, Or, Load (Rsrc1
ignored), Store. Rsrc and Rdest are registers. MEM is a base register and offset pair.
Branches use a full compare of two register and are PC-relative. Assume that this
machine is pipelined so that a new instruction is started every clock cycle. The following
pipeline (used in the VAX 8700) is:
IF-RF-ALU1-MEM-ALU2-WB
The first ALU stage is used for effective address calculation for memory references and
branches. The second ALU cycle is used for operations and branch comparison. RF is both a
decode and register-fetch cycle. Assume that when a register read and a register write
occur in the same clock, the write data is forwarded.
-
Find the number of adders needed, counting any adder or incrementer; show a combination
of instructions and pipe stages that justify this answer. You need only give one
combination that maximizes the adder count.
-
Find the number of register read and write ports and memory read and write ports required.
Show that your answer is correct by showing a combination of instructions and pipeline
stages indicating the instruction and the number of read ports and write ports required
for that instruction.
-
Determine any data forwarding for any ALUs that will be needed. Assume that
there are separate ALUs for ALU1 and ALU2 pipe stages. Put in all forwarding among ALUs
needed to avoid or reduce stalls. Show the relationship between the two instructions
involved in forwarding using the format of the table in Figure A.22 but ignoring the last
two columns. Be careful to consider forwarding across an intervening instruction,
for example:
ADD R1, ...
any instruction
ADD ..., R1, ...
-
We will now add support for register-memory ALU operations to the classic five-stage RISC
pipeline. To offset this increase in complexity, all memory addressing will be
restricted to register indirect (i.e. all addresses are simply a value held in a register;
no offset or displacement may be added to the register value). For example, the register-
memory instruction ADD R4,R5,(R1) means add the contents of register R5 to the contents of
the memory location with address equal to the value in register R1 and put the sum in
register R4. Register-register ALU operations are unchanged. Answer the following for the
integer RISC pipeline:
-
List a rearranged order of the five traditional stages of the RISC pipeline that will
support register-memory operations implemented exclusively by register indirect addressing.
-
Describe what new forwarding paths are needed for the rearranged pipeline by stating
the source, destination, and information transferred on each needed new path.
-
For the reordered stages of the RISC pipeline, what new data hazards are created by this
addressing mode? Give an instruction sequence illustrating each new hazard.
Deliverable |
The solutions to the above problems either typed-up (highly recommended) or nicely
hand-written. Include your name (or names, if working in a group), on all pages.
|
Due: November 13 |
Assignment 4: End of the Term Pseudo-Assignment
This assignment is a pseudo-assignment: you should do it but you do not have
to turn it in. The answers will be available for you to look at but you should
attempt the problems first as this will be a better aid in studying for the exam.
Part 1 is up first and contains problems from the 4th edition of the book.
Part 2 will be available in a couple days and will include questions from other
sources.
Part 1: Problems from the 4th editionDue: December 14
The following problems come from the 4th edition of the textbook (the official
edition for the class). Solutions are available on the textbook companion website,
available here.
- Chapter 2: Problems 2.1 - 2.8, 2.11, 2.12
- Chapter 3: Problems 3.1, 3.2
Deliverable |
As this is only a pseudo-assignment, you do not have to turn anything in.
|
Due: December 14 |
Part 2: Misc. ProblemsDue: December 14
coming soon...
Deliverable |
As this is only a pseudo-assignment, you do not have to turn anything in.
|
Due: December 14 |