part a: a program executes 20,000,000 instructions on a 100 MHz single cycle processor. what's the execution time?
for single cycle processors, CPI=1. so:
2e7 instructions 1 cycles 1 second
* ------------- * ---------- = .2 seconds
1 instruction 1e8 cycles
part b: we run the same program on a 400 MHz multicycle processor. what average CPI do we need to get the same performance as in part a?
we want the execution time of the program on our new processor to be the same as it was on the old processor. another way to look at it is that we're looking for a speedup of 1. so the setup looks like this:
2e7 instructions X cycles 1 second
* ------------- * ---------- = .2 seconds
1 instruction 4e8 cycles
solving for X, we need a CPI of 4.
we have a multicycle cpu. our workload is as follows: 30% alu instrs, 30% mem instrs, 10% float instrs, 30% branch instrs. it takes us 1 cycle per alu instr, 10 cycles per mem instr, 20 cycles per float instr, and 5 cycles per branch instr. what's the average cpi?
this is just a weighted average of the individual CPI's:
.30 * 1 + .30 * 10 + .10 * 20 + .30 * 5 = 6.8
another way to think about this problem is like this: suppose we have a completely average program containing 100 instructions. how many cycles do we need to execute the program? well, we know there will be 30 alu instrs, 30 mem instrs, 10 float instrs, and 30 branch instrs, so:
30 alu instrs * 1 cycle / alu instr + 30 mem instrs * 10 cycles / meminstr + 10 float instrs * 20 cycles / float instr + 30 branch instrs * 5 cycles / branch instr = 680 cycles
now, what was our CPI for our completely average program?
680 cycles / 100 instructions = 6.8
we divide all instructions into integer instrs and fp instrs. our program takes 100 seconds to execute on our old processor. we have a new processor that runs fp instrs 10x faster. our program executes in 70 seconds on the new processor. how much of our execution time on the old processor was spent running fp instrs?
for problems invovling amdahl's law, i find it helpful to draw diagrams like these:
from the diagram, it's pretty clear that we can solve for x. we know that the amount of time we spend executing integer instructions will be the same on our new processor, because our improvement only affects floating point instructions. after we solve for x, the percentage of time we spent on the old processor doing fp instrs is just x/100.
to solve for x:
(100-x) + (x/10) = 70 ... x = 33
to find the amount of time we spent on the old processor doing fp instrs:
33/100 = 33%
part a: given that 01000101000110001110000000000000 is a 32-bit ieee fp number, convert to decimal.
part b: give the 32-bit ieee fp representation of -43.265625
i'm using the following single cycle datapath diagram from the class webpage:
we want to add an instruction loop r1, r2, offset. this
new instruction has the same effect as the following two instructions:
addi r1, r1, 1 bne r1, r2, offset
to answer this question, (1) give the above sequence of instructions using the rs, rt, rd, and immediate fields from the immediate format, (2) draw the parts of the datapath that have been changed, and (3) give the state for the control for this instruction in the modified datapath.
for part (1), recall that i-type instructions use the rs, rt, and immediate fields of the instruction. so:
addi rs, rs, 1 bne rs, rt, immediate
for part (2), we need to look at our single cycle datapath, and figure out what we need to add to support this new instruction.
the new instruction that we're adding is a modified branch instruction. so to begin, let's take a look at how the datapath is used for branch instructions.
when we execute a normal bne instruction (like bne $1, $2, 7), we use the main alu to figure out if $1 and $2 are equal or not. we do this by subtracting registers $1 and $2. in other words, we set alusrc=1, aluop=add, and we check the "zero" output on the alu. the "zero" output tells us if the alu result is zero or not: zero=1 means the output was zero, zero=0 means the output is not zero. if zero=0, then we know that registers $1 and $2 are not equal, and therefore we know that the branch is taken.
remember that there is some additional logic required for branch instructions that is not shown on the datapath [it's discussed in the book]. the problem is that the control logic doesn't know the value of pcsrc, because it depends on whether the branch was taken or not. if the branch is taken, we want pcsrc=0, otherwise we want pcsrc=1.
the figure below shows the two additional gates needed to support bne instructions. try pushing a few values of "bne", "zero", and "pcsrc" through these two gates. if bne=0, the value of pcsrc determines which way the mux will go. if bne=1 and pcsrc=1, then the value of zero determines which way the mux will go.
so to summarize, when we execute a normal bne instruction, we set bne=1, pcsrc=1, alusrc=1, aluop=subtract, regwrite=0, memread=0, memwrite=0.
now we need to figure out how our new instruction is different from normal bne instructions. we need compute rs+1 before comparing with rt, and we need to store rs+1 back into the register file.
to compute rs+1, we're going to need another adder. we can't use the main alu because we need that to do the comparison. it needs to take the value of rs, add 1, and its output needs to go into the main alu to be compared with rt. we need to mux the output of this adder and the original value of rs, because we only want rs+1 going into the alu for our new instruction. we'll call the control signal on this new mux "addone".
we also need to write the value of rs+1 back into register rs. to do this, we need to be able to write to register rs [our datapath can currently only write to registers rt or rd]. so we need to make the "regdst" mux bigger... rs must be one of our options. we also need to get the value of rs+1 into the "write data" port on the register file. to do this, we can make the "memtoreg" mux bigger, and make rs+1 one of our options.
these changes are shown below:
how do we set the control signals for this instruction? we need to choose pc+4 or the branch target [the old pcsrc signal] based on the value of the "zero" output of the alu. we set bne=1 and pcsrc=1 to achieve this effect. we need to compare rs+1 with rt, so we need to set addone=1, we need to compare with rt, so we set alusrc=1, we need to compare, so we set aluop=subtract, we need to write rs+1 into rs, so we set memtoreg=2, regdst=2, and regwrite=1. we don't touch memory, so memread=0, and memwrite=0.
i'm using the following multi-cycle datapath from the class webpage:
we want to add a MemIndAdd r1,offset(r2) instruction
which does the following:
tmp=memory[offset+r2] tmp=memory[tmp] r1=r1+tmp
we need to (1) show the code sequence using immediate field, rs, rt, rd, and the multi-cycle hardware registers, (2) modify the datapath to execute the new instruction, and (3) show the fsm for the control for this instruction.
the first step is to figure out how many cycles we need to execute this instruction, and what needs to be done on each cycle.
it will take us one cycle to compute the effective address [offset+rs], one cycle to do the first memory read [memory[offset+rs]], one cycle to do the second memory read [memory[memory[offset+r2]]], one cycle to add r1 to that value [r1+memory[memory[offset+r2]]], and one cycle to store this mess into r1.
so we're looking at 5 cycles of execution. including fetch and decode, it will take us a total of 7 cycles.
let's look at each cycle in a little more detail. let's describe how data needs to move across our datapath in each cycle, using rs, rt, rd, immediate, etc.
now let's look at each cycle in even more detail, using the registers on our datapath to store temporary values. ir = "instruction register", mdr = "memory data register".
the above is what i'd write down for part (1) of this question.
for part (2), we need to figure out what changes we need to make to the datapath. let's go through each cycle, and figure out if the datapath can handle the operations we want to perform.
okay, we need to extend some muxes. i'm going to add a "pcwrite" signal to the pc also [because we don't want to be writing the alu's output into the pc on every cycle]. when we're done, the datapath will look like this:
part (3) wants us to show the fsm for the control of this instruction. if you've come this far, this is the easy part :)
we need to figure out how the control signals need to be set on each cycle of execution of our new instruction, to achieve the effects we described in part (1) of this problem. here we go: