CSE 30 -- Lecture 13 -- Nov 12

In this lecture we gave back the midterms, and we talked about pipelining.

Pipelining is discussed in chapter 6 of your book.


The basic idea behind pipelining is to overlap the execution of instructions. On the MIPS R2000, an instruction goes through five stages: IF, RD, EX, MEM, and WB. These correspond to:
  • Instruction Fetch: the instruction is fetched from memory (as one word).
  • Read registers: if you look back at the instruction , it is possible to start reading the register file without knowing what the opcode is. At the same time we Decode the instruction to decide what we will perform in the next step.
  • Execute: the ALU performs the operation using the values read in RD; in case of a load/store, we actually compute the effective address (this requires an addition). The result is passed to the next stage.
  • Memory: we store or load data in memory. In case of a non-memory instruction, this stage doesn't do anything.
  • Write Back: we store the result in the register file, if appropriate (in case of a store, this stage doesn't do anything).
  • Usually, once an instruction has finished using the IF stage, we can push the next one into this stage. And so forth: as soon as an instruction finishes a stage, the next one is pushed into it.

    This is obviously better than waiting for an instruction to complete the whole sequence before pushing the next one. In the case where everything goes well, and assuming all stages are of equal duration, the speedup is the depth of the pipeline: here five. More generally, the maximum throughput is dictated by the longest stage.

    However, everything does not go well all the time.


    One major problem with pipelining is the creation of hazards. In the following code sequence:
    	lw $t0, array($t1)
            add $t2, $t0, 1
    There might be a problem with the availability of the loaded value: the lw is in EX (computing the address) when the add tries to read the value of $t0 from the register file in the RD stage. The add might be reading the previous value of $t0, which was not what we intended ! This a data hazard . An instruction's execution depends on the result of a previous one, which has not completed yet.


    One way of solving the problem is to introduce a stall or bubble in the pipeline. Some special hardware, called an interlock , detects the condition and forbids the add to move into the next stage; instead a dummy no-op instruction is inserted: the bubble. This hurts performance, since no useful work is done by the bubble.


    A better way, when applicable, can be shown on the following code sequence:
    	sub $t1, $t0, $a0
            add $t2, $t1, 1 
    The result of the sub is not needed by the add before it is actually available: the sub finishes the EX stage, and the result could be given to the add entering the EX stage. This is called forwarding.

    To implement this, we need some special hardware. It will have to detect that $t1 is produced by the sub and is going to be used right away by the add.

    Note that the "before" is very important: in the case of the lw/add pair, there is one stall we cannot avoid, because the data is simply not available when it is needed. There are other kinds of hazards.

    Control hazards

    In the following sequence:
    	sub $t1, $t0, 1
            beq $t1, $zero, loop
    	add $t1, $t1, 1 
    In the simple pipeline, we need to wait until the second stage (RD) to determine whether or not we will take the branch. But meanwhile, the add has been fetched. But we might not want to execute the add, since it looks like something we would like to do only when exiting the loop. If we want to enforce this, and declare the add as not executed when the branch is taken, then we have one stall (at least).

    On more complex pipelines, we could be unable to decide whether the branch is taken without getting to a later stage; the penalty would be even larger. Think of more complex tests in branches.

    Delay slots

    This lead the MIPS designers to a compromise: we will execute the instruction just after the branch, the one that is in the branch delay slot, even when we take the branch. This is usually hidden from the programmer, by trying to reorder the instruction when compiling/assembling.

    As processors become more and more complex, the delay slot technique becomes less important. It doesn't help much to be helped by the instruction set on the instruction that follows the branch when you actually have six instructions in flight.

    [ CSE home | CSE talks | bsy's home page | webster i/f | yahoo | lycos | altavista | pgp key svr | spam | commerce ]
    picture of bsy

    bsy@cse.ucsd.edu, last updated Thu Nov 13 14:06:35 PST 1997.

    email bsy

    Don't make me hand over my privacy keys!