CSE141L Lab 5: 5-Stage MIPS Processor


Due July 28 (11:00pm)

Overview

In Lab 5, you will speedup your single-cycle MIPS processor by adding pipeline stages. Specifically, you will be adding 5 pipeline stages to your design. At the end of this lab, you will be able to see your processor run faster and measure speedup using benchmark applications which you can compile using the MIPS cross-compiler toolchain, and simulate your design in Modelsim.

This lab (and the next one) will require much more work than the previous labs and will probably take longer than you expect, so start early. We advise you to add pipeline stages one at a time and have them working. Your grades will depend on the functional working of the processor. For example, a 4-stage working MIPS processor will get better grades than a 5-stage processor that fails to run the benchmarks. The key is

Adding stages one at a time is optional. You also code 5 stages at the same time

General Notes

Test Scripts(Optional use)

The use of these scritps is optional. For this lab and next, we will run each one of your designs using batch script files and check if they are working.

5-Stage MIPS Pipeline

To add pipeline stages you need to modify the datapath and add support for stall logic. You don't need to make any changes to the control logic. We will not be providing you a schematic for this lab. You do not have to implement load delay slots in your processor. Also you don't have to implement forwarding logic.

Before you starting coding your pipeline stages

  1. Identify the data path inputs and outputs of each stage of your piepline. For example, the ID/EX stage needs to register Read_Data1, Read_Data2, Sign Extendend Immediate, PC_next and the Instruction. You will need to know the instruction to implement the stall logic.

  2. Classify the control signals according to the stages they connect. Remember that the book does not include all the control signals.

  3. Write down the list of possible hazards with the instructions implemented so far. For example, there are no structural hazards in your design.

Stall Logic

The stall logic or the hazard detection logic is a combinatorial module. It takes as input the opcode and read/write register address of the instruction begin executed in each piepline stage. When a hazard is detected, the stall signal must be asserted that pushes bubbles through the pipeline. A bubble is essential executing a NOP instruction. We will run through an example of RAW data hazard.

Consider the following two instructions

INST-1: add $(10), $(11), $(12)

INST-2: add $(13), $(10), $(10)

INST-2 needs to read from register $(10) the value written by INST-1. To detect this hazard

Lets look at another example when its a Control hazard.

INST-1: BNE $(10), $(11) offset

INST-2: NOP

INST-3: ADD ($10), $(11), $(12)

The outcome of the branch instruction is only known at the output of EX stage. If the instructions continue to execute through the pipeline, INST_2 and INST_3 will be executed even if the branch is taken. The stall logic needs to identify control dependency in the instructions and assert the stall signal until the outcome of the control is known.

Task 1: Update your schematic from Lab 4 or draw a new schematic that shows the additional logic you will need to add to your processor to implement the pipeline registers. Label the input and output signals of each pipeline stage. You must draw this schematic rather than use the tools to infer it. You will greatly appreciate having this schematic as you complete you design. Be sure to include the names of your wires and their widths.

Task 2: Create a table that outlines the when the stall signal needs to be asserted. In the first column, include a row for each type instruction. Label the top of each other column by the type of instruction(s). Fill in the table which indicates when the stall signal is asserted for them.

Instruction Type I1/I2 R-Type (excluding branch and jump) Branch Instructions Jump Instructions I-Type Instructions
R-Type (excluding branch and jump)
I2.rs==I1.rd or I2.rt == I1.rd      
Branch Instruction
       
Jump Instruction
       
I-Type Instruction
       

Testing The Whole Processor

(Optional)In order to simulate the programs you will create for this lab, you should follow the instructions from above. Namely, you should remember to set the INIT_PROGRAM parameter for the instruction ROM and data memory.

Unit Testing

In this lab, we provided four benchmark programs in lab5-files.zip: No branch hello world, Hello world, and fibonacci number.

The instructions for using these programs is the same from Lab 4.

Task 3: Simulate your processor for instructions that produce data and control hazard.

When you are satisfied that your hazard unit is behaving correctly, its time to move on to testing larger programs.

Task 4: Run "No branch hello world", "Hello world", and "fibonacci number" contained in the zip file we provided to make sure that your pipeline processor still work with them. Using the number of clock cycles from simulation, measure the CPI of your pipelined design. Is it different from the CPI of Single Cycle Processor?

Benchmarking

Wrap-up

Task 5: Once your processor is working correctly, run the whole synthesis and place and route flow and include the Fmax (and minimum period in nanoseconds), Total logic elements, and the Total registers statistics from your design.

Task 6: First, try to guess what the critical path in your design is. Say where your critical path starts, where it goes to, and where it ends. For example, you may say that the critical path is from the output of the PC register, through the adder, and back to the register. Now, try to use the tools to identify the critical path in your design. Instructions are on the Wiki. If your initial guess was wrong, try to explain why the path the tools report is correct. Think in terms of the amount of logic that is happening in a path, how many gates a wire might be connecting to, etc.

Interview Questions

  • Show us the schematic of your design

  • Show us that your processor can execute the four benchmarks provided in this lab in ModelSim

  • Complete the following table.

    The Cycles Per Instruction (CPI) of a program is computed by calculating the total number of instructions that processor executed divided by the number of cycles it took to execute those instructions.

    No Branch Hello, world Hello, world Fib
    Number of Instructions
    (excluding NOPs) in *.dis file
    Number of Cycles
    Cycles Per Instruction (CPI)

  • Tell us the critical path, Fmax in your design

Due: July 28 (11:00pm)