cse141L Lab 4: Single-Cycle MIPS Branches


Changelog

April 24

We found that comparisons with a constant will end up using the slti and/or sltiu instructions. Therefore, we are adding them to the list of instructions below. Since this mistake was detected early, we hope it's not too much trouble to require you to implement it, but if you feel otherwise, let us know and we'll try to work something out.

May 2

There was a bug with the ALU that's fixed now. Please download it again.


Due: May 06

Small Change to ALU

At the end of the lab, you will be asked to compile two different versions of a program: one without compiler optimizations and one with compiler optimizations. Unfortunately, we found out that the compiler uses shifts, which were not part of the original ALU design provided. We have thus augmented the ALU to include shift functionality, the source code for which can be found here. The new documentation for the ALU is shown below.

The alu module:
module alu(
	input [5:0] Func_in,
	input [31:0] A_in,
	input [31:0] B_in,
	output [31:0] O_out,
	output Branch_out,
	output Jump_out
);
Because our ALU has different inputs and outputs, here's a list of operations it can perform and what value of Func_i each corresponds to.
Func_in Operation O_out value Branch_out Jump_out
000X00 Shift Left Logical (B << A) 0 0
000X10 Shift Right Logical (B >> A) 0 0
000X11 Shift Right Arithmetic (B >>> A) 0 0
100000 ADD A+B 0 0
100001 ADD A+B 0 0
100010 SUB A-B 0 0
100011 SUB A-B 0 0
100100 AND A AND B 0 0
100101 OR A OR B 0 0
100110 XOR A XOR B 0 0
100111 NOR A NOR B 0 0
101000 Set-Less-Than Signed (signed(A) < signed(B)) 0 0
101001 Set-Less-Than Unsigned (A < B) 0 0
111000 Branch Less Than Zero A (A < 0) 0
111001 Branch Greater Than or Equal to Zero A (A >= 0) 0
111010 Jump A 0 1
111011 Jump A 0 1
111100 Branch Equal A (A == B) 0
111101 Branch Not Equal A (A != B) 0
111110 Branch Less Than or Equal to Zero A (A <= 0) 0
111111 Branch Greater Than Zero A (A > 0) 0

Small Change to Instruction ROM

The programs that you will generate for Lab 4 will be too large to fit in a 1K instruction ROM. Therefore, please modify your instruction ROM to have an address width of 10 bits. For example, in my processor.v module, I instantiated the instruction ROM like this:

// Instruction ROM Instantiation
inst_rom #(
        .ADDR_WIDTH(10),
        .INIT_PROGRAM("../memh/lab3.inst_rom.memh")
) inst_rom_inst (
        .clock(clock),
        .reset(reset),
        .addr_in(pc_next),
        .data_out(instruction)
);

If you get errors about a memh file being too large, this is probably your problem. Let us know if you have any problems with this.

Lab 3 Example

For those of you who have not quite gotten Lab 3 to work, we will post an example version by Wednesday after 3pm (after the late penalty deadline).

Overview

In Lab 4, you will finalize the design of your single-cycle MIPS processor. Specifically, you will be adding support for branch and jump instructions. In addition, you will add support for a sufficient subset of the MIPS ISA to enable you to run many common applications. At the end of this lab, you will be able to write your own applications, compile them using the MIPS cross-compiler toolchain, and simulate your design in Modelsim.

This lab (and the next two) will require much more work than the previous labs and will probably take longer than you expect, so start early.

General Notes

MIPS ISA Subset

In Lab 3, you added a control unit that supported a few basic instructions: LW, SW, ADD, ADDI, SUB, AND, OR, NOR, XOR. However, we will need to add a few more instructions to run a typical application.

This is the subset of the MIPS ISA that you will need to implement by the end of this lab. You will need to modify many parts of your datapath, as well as expanding your control unit. Depending on your implementation, you may need to modify your ALU. You will need to create new modules. We will not be providing a schematic for you for this lab. Before testing your project with the example applications below, you will be required to implement the remaining instructions. Use page B-49 of your textbook (either edition) to see how each instruction is encoded.

Subset of MIPS ISA
Branches & Jumps Loads & Stores Arithmetic & Logic Shifting
BEQ
LB
ADDU
SLL
BNE
LH
ADDIU
SRL
BLTZ
SB
SUBU
SRA
BGEZ
SH
ANDI
SLLV
BLEZ
LBU
ORI
SRLV
BGTZ
LHU
XORI
SRAV
JUMP
LUI
JR
SLT
JAL
SLTU
JALR
SLTI
SLTIU

Branches & Jumps

There are many types of the branch instruction, but their basic function is to change the program counter (PC) based on the evaluation of some condition. This instruction enables the if-else, while loop, and for loop contructs in typical programming languages. The jump instruction is similar in that it is capable of changing the PC, but does not require evaluating any condition. The jump-and-link instructions are used for function calls.

Your design will not include a branch delay slot.

As a reference, you should use Figure 4.24 on page 329 in the textbook (either edition). The top third of the figure includes most of the logic that you will need to implement. Be advised that the figure does not include the logic necessary to implement JR, JAL, and JALR.

Store Byte and Store Half

The SB instruction stores the low-byte of a register to memory. The SH instruction is similar, except it stores the low two bytes of a register to memory. We have already included support in the async_memory module for the SB and SH instruction. The data_memory module has a port called size_in. You will have to modify your control unit to set this signal appropriately. It is a 2-bit input, but only accepts three possible values:

Load Byte and Load Half

The LB instruction returns a sign-extended byte. The instructions lets you specify any byte within the four-byte word (using the last 2 bits of the address). Therefore, you must include logic for shifting the output of the data memory to be in the least-significant position and sign-extending it. In other words, if I request the byte at address 0x03 and the word at that location is 0x80000000, I should return 0xFFFFFF80. The LH word is similar except it returns a shifted, sign-extended two-byte half instead of a single byte.

The LBU instruction is similar to the LB instruction, except the byte is zero-extended. Therefore, in the previous example, if I request the byte at address 0x03 and the word at that location is 0x80000000, I should return 0x00000080. The LHU instruction acts similarly except it returns a half-word (two bytes) instead of a byte.

Your design will not include a load delay slot.

Shifts

By and large, the shifting functions are similar to the standard R-type. The only difference is what you multiplex into the "A" ALU operand. Depending on whether you're doing a variable or constant shift, you will need to multiplex in either: the value read from the register specified by bits 21-25 of your instruction, or the value pertaining to bits 6-10 of your instruction.

Question 1: Update your schematic from Lab 3 or draw a new schematic that shows the additional logic you will need to add to your processor to implement the remaining instructions in the MIPS ISA Subset. You must draw this schematic rather than use the tools to infer it. You will greatly appreciate having this schematic as you complete you design. Be sure to include the names of your wires and their widths.
Question 2: Create a table that outlines the values of each of your control signals with respect to the current instruction. In the first column, include a row for each output signal your control unit will have. Label the top of each other column by the instruction(s). Fill in the table with the values of the control output signals for each intruction or group of instructions. If you recognize that a group of instructions share the same sontrol signal output values, you may consolidate them into a single column to save space.

Testing The Whole Processor

In order to simulate the programs you will create for this lab, you should follow the same instructions you used for Lab 3. Namely, you should remember to set the INIT_PROGRAM parameter for the instruction ROM and data memory.

Cross-Compiler

A cross-compiler has been created for you and is located on ieng6.ucsd.edu. Many of you already have accounts to this machine, so you can use your regular username and password. If you do not have an account, please contact us on the staff mailing list.

After you have logged into the ieng6.ucsd.edu, run the following commands:

$ prep cs141s
$ mkdir cse141L
$ cd cse141L
$ wget http://cseweb.ucsd.edu/classes/sp13/cse141L-a/Media/lab4/lab4-files.zip
$ unzip lab4-files.zip
$ ls

This will download and extract the files you will need for this lab. Here is a listing of the important files. Take a look around at what is going on in each file.

You can compile the programs by running 'make' in that directory. If you look at the contents of the directory, you will see that each of the programs has produced the appropriate memh files needed for simulation. It will also produce a *.dis file, which is the assembly output of the compiler.

If you wish to create your own application, feel free to copy fib.c or helloworld.c as a starting point. Use the nbhelloworld.s as a starting point for creating a new program in assembly. You will have to edit the top few lines of the makefile to make sure your program gets compiled. The instructions are located at the top of Makefile, but they are repeated here for completeness.

#For each application you want to build, add a new line such as:
#myApp: source1.o source2.o source3.o
#ALSO, add myApp to the APPS= line
# Be sure to include lib141.o if you wish to use the SendByte, itoa, or other
# functions in lib141.c

hello_world: hello_world.o lib141.o
nbhelloworld: nbhelloworld.o
lab3-test: lab3-test.o
fib: fib.o lib141.o

APPS=lab3-test hello_world nbhelloworld fib

If I wanted to add a new program "sort" that is located in sort.c and uses the lib141 library, I would make the following changes to the top of the Makefile:

#For each application you want to build, add a new line such as:
#myApp: source1.o source2.o source3.o
#ALSO, add myApp to the APPS= line
# Be sure to include lib141.o if you wish to use the SendByte, itoa, or other
# functions in lib141.c

hello_world: hello_world.o lib141.o
nbhelloworld: nbhelloworld.o
lab3-test: lab3-test.o
fib: fib.o lib141.o
sort: sort.o lib141.o

APPS=lab3-test hello_world nbhelloworld fib sort

Unit Testing

Now that we have described the cross-compiler, you can use it to create your own test programs. In this part of the lab, you will be required to test each of your new instructions. You will want to write your test programs in assembly in this stage to guarantee that the compiler produces the correct commands.

You may write programs that test a single instruction or that test multiple instructions, but be sure to test every instruction (including the instructions from the previous lab). Don't forget to add your programs to the Makefile as described in the previous section. If you do not see the memh files that are prefixed with your program name after running 'make', it is probably because you forgot to add the correct targets to the Makefile.

Question 3: Simulate your control unit for each instruction. In the report, include the waveform that shows all of the control output signals. Below the waveform, please specify which instructions were tested and in what order. (e.g., "ADDI, BEQ, LUI, ...").

When you are satisfied that your control unit is behaving correctly, its time to move on to testing larger programs.

"Hello World" with Branches

In Lab 3, you ran a simplified version of "Hello World" that avoided the use of any branch or jump instructions. We were able to do this by not testing the serial interface to see if it was ready to transmit data. Now that we have branches, however, we can test the serial interface by reading the serial output status register.

Question 4: Run the "helloworld" program contained in the zip file we provided. If you do not see the memh files, you will need to run 'make' to generate them. Simulate your processor until it prints "Hello World" and put the output from the messages panel (usually at the bottom of the modelsim window) in your report. Also include a waveform that shows all of the internal wires in your processor module.

(The lack of) branch and load delays

The files in lab4-file.zip configure the compiler to not place useful instructions in the branch delay slot or load delay slots. Instead, it always put a nop in the slot. We have configured the compiler this way because the processor you will build during this lab does not have a branch delay or load delay slot. Instead, branches occur immediately, and the results of loads are available immediately.

Question 5: Write two assembly files branch-delay.s and load-delay.s to demonstrate that your design does not have a branch or load delay slot, respectively. Through comments in the code, describe the behavior you would expect to see with and without the slot. Include a waveform demonstrating that you see the correct (no slot) behavior for this lab in your design.

Benchmarking

In this section, you will experiment with benchmarking two different version of a program that prints out the first 10 Fibonacci numbers (0,1,1,2,3,5,8,13,21,34): one that computes the result iteratively and one that uses recursion. We have already provided an initial implementation of an iterative version of this program called fib.c in the zip file you extracted. Feel free to modify this to make it faster. Write a new version of the program that uses recursion to calculate the sequence. Compile and simulate both versions and record your results in the table for Question 6.

Now we will investigate compiler optimizations. Open the Makefile and find/edit the following line:

CCFLAGS=-fno-builtin -nodefaultlibs -nostdlib -c -mips1 -mno-abicalls -static -O4

This will turn on some compiler optimizations. Run a 'make clean' to remove previous versions of your programs and re-compile and simulate. Record your new results in the table for Question 6.

Question 6: Complete the following table.

To calculate the number of cycles that the program actually runs, use the times that are reported in the Modelsim console when data is output to the serial port. Take the time that Modelsim reports printing an "E" (for "End") and subtract it by the time Modelsim reports printing an "S" (for "Start"). Then, divide the difference by the testbench clock period, which is set in testbench.v.

The Cycles Per Instruction (CPI) of a program is computed by calculating the total number of instructions that processor executed divided by the number of cycles it took to execute those instructions.

Iterative Fib No Opts Recursive Fib No Opts Iterative Fib w/Opts Recursive Fib w/Opts
Number of Instructions
(excluding NOPs) in *.dis file
Number of Cycles
Cycles Per Instruction (CPI)

Wrap-up

Question 7: Once your processor is working correctly, run the whole synthesis and place and route flow and include the Fmax (and minimum period in nanoseconds), Total logic elements, and the Total registers statistics from your design.
Question 8: First, try to guess what the critical path in your design is. Say where your critical path starts, where it goes to, and where it ends. For example, you may say that the critical path is from the output of the PC register, through the adder, and back to the register. Now, try to use the tools to identify the critical path in your design. Instructions are on the Wiki. If your initial guess was wrong, try to explain why the path the tools report is correct. Think in terms of the amount of logic that is happening in a path, how many gates a wire might be connecting to, etc.

Deliverable

  • Submit your lab as two files (the project PDF and a zip file) at ted.ucsd.edu. The link to upload the file is available under "Assignments"->"Lab 4: Single-Cycle MIPS Branches". The deadline for submitting your report is Monday, May 6th by 3PM . Do not send softcopies to TAs email ids.
  • The zip file needs to:
    • Be named cse141L-lab4-LastName-FirstName.zip with your last name and first name substituted for LastName and FirstName, respectively.
    • Include a single-pdf report that answers all of the questions found in the lab description - including questions, verilog source code, graphs, screen-shots, etc. There are many tools out there capable of integrating text and graphics and producing PDF files (OpenOffice does a pretty good job).
    • Your project source code and quartus project files - we should be able to build your project without errors from these files. (Except for path problems to *.memh files).
    • Your source for the test programs you made, including the two versions of Fibonacci, as well as the modified Makefile.

Due: May 06