cse141L Lab 6: Materialize Your Processor - Datapath


Changelog

February 29 Lab due date extended to Wednesday, March 5th. Turn in electronic materials by midnight. Paper material (highlighted datapaths, etc) in 141 on Thursday or to Raid's mailbox.
February 27 Turn in this lab electronically.

Due: 11:59pm, March 5

Overview

At this point the project you will take full control of your processor design. Your goal is to end up with a working processor that implements you ISA, but the details of that design are completely up to you. This and the following labs will provide you with guidance on how to proceed, but many design decisions will not be spelled out.

Now that you have a validated ISA, it is the time to implement a full processor based on your ISA. In lab 6, you will design and implement the datapath of the whole processor. Specifically, you need to 1) revise your fetch unit, 2) identify and implement major components of the backend datapath, 3) design and implement a complete datapath of your processor. You will draw a datapath schematic of the whole processor and implement it in structural Verilog. Once the datapath is completed, you will complete the processor by adding control logic in the next lab, and then evaluate its performance against various benchmarks.

Datapath - The Big Picture

You were given a datapath design for the frontend (fetch unit) in lab 2. In lab 6, you will design the datapath of you processor by yourself. Basically, your processor should be able to support at least four pipeline stages - the front end is three stages, and the back end is another. However, your version of pipelining is a little more sophisticated, but also less complex than the 5-stage MIPS pipeline because it has a queue between the two stages to decouple them, rather than just a register. This decoupled design between the frontend and the backend is more common in modern processor designs.

The Processor Core Interface

The processor core is the top module of your processor design. It has the following interface:

// D_WIDTH  : data width
// PA_WIDTH : port address width

module core#(parameter D_WIDTH = 34, PA_WIDTH = 4)
(
    input  clk,
    input  reset,
    
    // I/O interface
    output in_req,
    output out_req,
    output [PA_WIDTH-1 : 0] in_addr,
    output [PA_WIDTH-1 : 0] out_addr,
    input  [D_WIDTH-1 : 0]  in_data,
    output [D_WIDTH-1 : 0]  out_data,
    input  in_ack,
    input  out_ack
); 

The core interface is simple - it consists of clk, synchronous reset, and I/O interface. When 'reset' signal transitions from high to low, the processor core starts executing from the 0x0 of the instruction memory. The I/O interface provides the interface for in and out instructions, which communicate to the outside world. Since the available pins are usually very limited, the I/O interface multiplexes 16 channels into one physical channel to save the pincount. All I/O interface signals are edge-triggered.

The I/O interface is similar to a conventional bus interface. 'in' instruction asserts 'in_req' and 'in_addr', then waits for 'in_ack' to become 1. If 'in_ack' arrives, the backend stores 'in_data' into the register the 'in' instruction specifies. In the best case, the 'in_ack' signal will arrive in the next cycle, but it can be arbitrarily delayed if no data is available in the specified channel. In the waveform example below, the core asserts 'in_req' with 'in_addr' to 0xF. The core spends two idle cycles, then it stores 0x123 to the register file when 'in_ack' signal arrives.

'out' instruction works in a similar way; but the backend also asserts 'out_data' along with 'out_req' and 'out_addr'. After asserting a request, the backend waits for 'out_ack' comes. Like 'in_ack', 'out_ack' could arrive after an arbitrarily long delay, signaling the completion of a request. In the waveform example below, the core asserts 'out_req' with 'out_addr' to 0xF and 'out_data' to 0x123, then waits for 'out_ack' comes. When 'out_ack' signal is asserted after one cycle, the core completes out instruction.

The Frontend (Fetch Unit) Revisited

You already have a baseline frontend implementation, which allows the fetch unit to work independently of the backend. However, you may find that you need to modify existing fetch unit so that it can be well matched to your ISA.

Q1. Review your fetch unit and update it if you need to. For each change you made, explain the reason in your hardcopy report. If you think your processor is fine with the existing fetch unit, you do not need to make any change.

The Backend

Major Modules

You will design and implement the datapath of the backend in this lab. Toward the goal, you need to identify major components of the backend first. The backend is likely to contain the following components:

Q2. List all the other leaf modules needed for the backend of your processor. For each module, list all the instructions that use the module.

The Backend Interface

The backend of your processor should use something along the lines of the following interface. The specifics of your interface will depend on whether you have made any changes to the fetch unit.

// I_WIDTH  : isntruction width
// IA_WIDTH : instruction address width
// D_WIDTH  : data width
// PA_WIDTH : port address width

module backend#(parameter I_WIDTH = 17, IA_WIDTH = 12, D_WIDTH = 34, PA_WIDTH = 4)
(
    input  clk,
    input  reset,
    
    // inputs from the fetch unit
    input  [I_WIDTH-1 : 0]  instruction_data,
    input  [IA_WIDTH-1 : 0] instruction_addr,
    input  instruction_valid,    
    input  [I_WIDTH-1 : 0]  load_data,
    input  load_data_valid,
    
    // outputs to the fetch unit
    output deque,
    output restart,
    output [IA_WIDTH-1 : 0] restart_addr,
    output load_store_valid,
    output store_en,
    output [IA_WIDTH-1 : 0] load_store_addr,
    output [I_WIDTH-1 : 0]  store_data,
    
    // I/O interface
    output in_req,
    output out_req,
    output [PA_WIDTH-1 : 0] in_addr,
    output [PA_WIDTH-1 : 0] out_addr,
    input  [D_WIDTH-1 : 0]  in_data,
    output [D_WIDTH-1 : 0]  out_data,
    input  in_ack,
    input  out_ack
); 

Roughly the interface of the backend consists of two parts - the fetch unit interface and the I/O interface. Both interfaces are already explained before.

Datapath Questions

Now that you have identified all the major components needed for the backend datapath, the next step is to design a datapath consisting of those modules you have identified. Make sure that your datapath can handle all the instructions of your ISA. Here are a few questions regarding your datapath.

For Q4 and Q5, answer questions with the style used in the following example.

Example Q and A

Question: what happens when 'restart' and 'restart_addr' is asserted? assume that 'load_store_valid' is not asserted.

Answer: Assume that 'restart' and 'restart_addr' are asserted in cycle 0.

example answer schematic

Design review

Before you continue beyond this point, you should understand, in detail how every instruction in your ISA is going to work in the data path. You should have a drawing of your datapath with all the datapath elements and signals labeled and all control lines present. You should also reason carefully about anomolous conditions will be handled (e.g., How will your processor block on 'in' and 'out' instructions? What will happen when memory requests are refused?). Ideally, from this point forward, there should only be implementing not designing.

To help you get to this point you must do two things:

  1. Find another team and swap design reviews of your datapaths. Ask hard questions about how they handle different cases. A similar process will help you spot any remaining weaknesses in your ISA.
  2. Spend the time to simulate some mid-sized (maybe 10 instructions) pieces of code by hand. Printing out several copies of your datapath will speedup this process.

Q6. What questions did your reviewers ask? What bugs did they uncover? Include the names of the members of the other team.

Q7. Provide one example of simulating a short instruction sequence.

Coding and Implementation

Now that you have the datapath design, let's start implementing the datapath. You need to implement 1) major components of the backend such as a register file and a data memory, 2) a backend consisting of various leaf modules, and 3) a processor core consisting of a frontend and a backend. As you did in lab 1, you can use RTL Verilog for the implementation of leaf modules. However, you must use structural Verilog for the implementation of the backend and the processor core.

Q8. Implement a register file module for the backend of your processor in RTL Verilog, and name the file regfile.v. Include the full source file in your hardcopy report.

Q9. Unlike for the register file, it is better to use a highly optimized memory generator for the data memory module rather than writing your own data memory module. Fortunately, Xilinx provides a utility named CORE Generator for such a purpose. Gererate a 8K word size data memory module named 'dmem_34_8k' with CORE Generator. You can follow a step-by-step tutorial at the end of this page to do it. For the initialization of the memory module, use any coe file you generated in lab 2b. Write a testbench which tests whether your memory module is working correctly, and then actually test your memory module. Capture the behavioral simulation result screen and include it in your hardcopy report with explanation. The captured screen should clearly show 1) read and write operations work correctly, and 2) memory is properly initialized with the provided *.coe file. The generated module should be used with the provided wrapper module dmem.v.

Q10. Implement all the other modules required in your datapath. You have to implement each module in a separate .v file with RTL Verilog as you did for modules like adders, muxes, and sign extenders in lab 2. You might find that you can reuse many leaf modules used in the fetch unit. Also include full Verilog source code in your hardcopy report.

Q11. Implement 'backend.v' in structural Verilog for the backend of the processor. You can skip control logic in lab 6. Since it is hard to test an implementation without control logic, it is enough to show that your Verilog file is synthesizable and implementable. You will complete the implementation with control logic in lab 7.

Q12. Implement 'core.v' for the top module of the your design. You can simply wire up the fetch unit and the backend. No control logic required for 'core.v'.

Tutorial - Generate a Data Memory Module with Xilinx Core Generator

With the following steps, you can generate a customized data memory module. In lab 7, you would be able to make a customized instruction memory module by slightly changing the configuration.

Deliverable

  • Submit all materials as a zip archive.
  • Like lab 1, your report consists of answers to the questions.
  • Submit relevant source files (zipped) via e-mail to TA (Your email title should be "[CSE141L] lab 6, name0, name1".
  • All Verilog designs must be synthesizable and implementable.
  • For Verilog implementations, use consistent and readable naming s tyle. The naming convention, proper indentation, and style will be grade d.
  • No lab interview for lab 6.

Due: 11:59pm, March 5