Multiple Cycle CPU

January 28 2004

before we begin...

if you do not understand the single cycle cpu, it will be very difficult for you to understand the multiple cycle cpu. if you have questions about the single cycle cpu, now is the time to ask them. :)

let's go over a few of the examples that we didn't have time to do last week. we can go over the quiz question too, if you want.

multiple cycle cpu

as its name implies, the multiple cycle cpu requires multiple cycles to execute a single instruction. this means that our cpi will be greater than 1.

the big advantage of the multi-cycle design is that we can use more or less cycles to execute each instruction, depending on the complexity of the instruction. for example, we can take five cycles to execute a load instruction, but we can take just three cycles to execute a branch instruction. the big disadvantage of the multi-cycle design is increased complexity. control is now a finite state machine - before it was just combinational logic.

another important difference between the single-cycle design and the multi-cycle design is the cycle time. in the single cycle processor, the cycle time was determined by the slowest instruction. in the multi-cycle design, the cycle time is determined by the slowest functional unit [memory, registers, alu]. this greatly reduces our cycle time.

outline

this outline describes all the things that happen on various cycles in our multi-cycle cpu. all the events described in each numbered item take place in one clock cycle.

  1. instruction fetch: load ir with instruction at pc, load pc with pc + 4
  2. instruction decode, read registers: parse the instruction, load registers A and B with values from the register file, load aluout with the target address of the branch
  3. execute: if executing a load or store, perform the effective address computation and put the result in aluout. for arithmetic instructions, load aluout with the result of the appropriate computation. for a beq instruction, subtract the values in registers A and B, and if the result is zero, load pc with the value in aluout. if executing a beq, we are done - return to step 1
  4. memory: if executing a load, load mdr with the data at address aluout. if executing a store, write the data in register b into memory at address aluout. if executing an arithmetic instruction, write the value in aluout into the register file. if executing a store or an arithmetic instruction, we are done - return to step 1
  5. writeback: if we are here, we are executing a load instruction. write the value in mdr into the register file, and return to step 1

datapath

here's our multi-cycle datapath:

multi-cycle cpu datapath

the first things you should notice when looking at the datapath is that it has fewer functional units than the single cycle cpu. we only have one memory unit, and only one alu. on the other hand, we have lots of registers that we didn't have before: ir ("instruction register"), mdr ("memory data register"), a, b, and aluout.

so, the obvious first question is: why can we get away with fewer functional units, and why do we need all these registers? we don't need as many functional units because we can re-use the same functional unit for a different purpose on a different clock cycle. for example, during the first cycle of execution, we use the alu to compute pc+4. on the second cycle, we use the alu to precompute the target address of a branch.

we need the extra registers because we will need data from earlier cycles in later cycles. for example, we read the register file in the second cycle of execution, but we will need the values that we read in the third cycle. the extra registers allow us to remember values across clock cycles.

if i point to any component on the multi-cycle datapath, you should be able to tell me what it is and why we need it.

control

the control for our multi-cycle datapath is now a finite state machine. the obvious first question is, again, why? we were doing just fine with combinational logic in the single cycle cpu, why do we need this complicated fsm now?

the fsm is necessary because we need to set the control signals differently on different cycles of execution for the same instruction.

for any instruction, you should be able to tell me how many cycles it will take to execute that instruction, and what the values of the control signals are in each cycle of that instruction's execution.

exercises

i want to support the addi instruction. what are the values of the control signals on each cycle? how many cycles does it take to execute this instruction? what are the values in each register on each cycle?

i want a "conditional move" instruction:
cmov $1, $2, $3 means "copy the value in register 2 into register 1 if register 3 is nonzero". what new datapath elements, if any, are required? how do we set the control signals for a conditional move instruction on each cycle of execution? how many cycles does it take to execute this instruction?