A list of Xilinx tips and tricks is available on Xilinx_tipsandtricks. Plz let the TA know if you have suggestions or problems with solutions to be added.
Due: February 14
Now that you have practiced designing and implementing a simple processor core, it's time you gain a deeper insight into processor architecture by designing and implementing a more complex processor. In this lab you will again be working in groups to design the datapath for your own processor. This processor will be able to run any program written to the ISA specification you created in cse141. By the end of the next lab (in which you will design and implement the control for your processor), you will have to demonstrate the correctness of your processor by running (simulating) the Fibonacci and SuperGarbage benchmarks from cse141.
This lab and the next will be different from the previous labs in that you won't be given a design to follow, only some general guidelines; you will be making many of the design decisions yourself.
This lab (and the next two) will require much more work than the previous labs and will probably take longer than you expect, so start early.
You are to design the datapath for your processor. It must be pipelined with a minimum of two pipeline stages (you may have as many pipeline stages as you would like as long as you have at least two). One of your first design decisions will be number of pipeline stages and what each stage will need to do.
Remember, your ISA is not due for another week, so you can still change it to make the datapath design simpler. Take advantage of this opportunity, and get started on the datapath design now.
Ultimately, you will need to build a processor with at least two pipeline stages, and that is all that is strictly required for this lab. However, I strongly recommend designing a single-cycle version of the datapath first. This corresponds directly to the process we went through in class for a subset of the MIPS ISA.
As refresher, here are the basic steps for implementing a single-cycle datapath:
Once you have a single-cycle implementation, you'll need to decide where to draw the line between the two pipeline stages. In making this decision, you'll have to consider the dependencies among your hardware resources as each instructions moves through your pipeline and estimate the length of the critical path in each pipeline stage to evenly distribute work among stages (because your whole processor can only be clocked as fast as your slowest pipeline stage).
You are free to reuse any of the components from previous labs. You will probably find that most of the components are reusable by simply changing the parameters. One important new component that you have not designed or implemented before is the pipeline register. Your processor core should have at least one of these (more if you have more than two pipeline stages). Because you will have more than one pipeline stage, you will typically have more than one instruction executing in any given cycle and you will have to consider data forwarding paths to handle inter-instruction data dependencies. The easiest way to reason about what data forwarding paths you will need is to consider all pairs of instructions (or instruction types) and identify the data forwarding necessary when the instructions are consecutive, separated by one instruction, separated by two instructions, etc (depending on how deeply your processor is pipelined).
Another thing that you have to consider when multiple instructions are in-flight at the same time in your processor is the potential for structural hazards. Identifying and planning around structural hazards now is an important part of the design process that will pay off big time when you are debugging your processor on lots of real code in the next lab.
You will also need to consider how stalls and flushes will work in your processor. You will build the mechanisms to detect and control stalling and flushing in the next lab. But in this lab you only need to engineer your datapath to support such operations. For example, one aspect of stalling involves holding some of the pipeline registers at their current state, so having a "write enable" control signal as input to your pipeline registers is a good idea. That way, when your control path detects the need to stall, it can de-assert some of the pipeline registers' write_enable control signals.
The processor core is the top module of your processor design. It has the following interface:
// D_WIDTH : data width // PA_WIDTH : port address width module core#(parameter D_WIDTH = 34, PA_WIDTH = 4) ( input clk, input reset, // I/O interface output in_req, output out_req, output [PA_WIDTH-1 : 0] in_addr, output [PA_WIDTH-1 : 0] out_addr, input [D_WIDTH-1 : 0] in_data, output [D_WIDTH-1 : 0] out_data, input in_ack, input out_ack );
Note: you will only be designing and implementing the datapath in this lab, so only a subset of the processor core input and output wires will be wired up to your datapath; the rest will be wired up to your control path in the next lab.
The core interface is simple - it consists of clk, reset, and I/O interface. When 'reset' signal transitions from high to low, the processor core starts executing from the 0x0 of the instruction memory. The I/O interface provides the interface for in and out instructions, which communicate to the outside world. The I/O interface is the same from previous labs
As in previous labs, you are required to use the provided wrapper module for the data memory: dmem.v. However, this wrapper is a bit different. Occasionally it will assert the 'refused' signal and not give the requested data until a later cycle. When a read or write request is refused, your processor will need to stall and wait for the refused signal to be de-asserted.
Why do we have a 'refused' signal? The 'refused' signal indicates that the memory module cannot service the requested operation at that time. There are a few reasons for the asserted 'refused' signal; the memory could be being refreshed; or another processor core has already been using the memory in the multi-core system.
module dmem#(parameter A_WIDTH = 13, D_WIDTH = 34) ( input reset, input clk, input read_write_req, input write_en, input [A_WIDTH-1 : 0] addr, input [D_WIDTH-1 : 0] din, output [D_WIDTH-1 : 0] dout, output refused );
In the waveform above, the core requests one write operation and one read operation. Unfortunately, the data memory cannot service the read operation, asserting 'refused' signal. Upon the 'refused' signal, the core requests the same operation in the next clock and successfully gets the memory value in the second trial.
Question 1: List all the other leaf modules needed in the datapath of your processor. For each module, list all the instructions that use the module.
Question 2: Draw the datapath schematic of your processor. Annotate all the relevant control signals. (Note: PowerPoint and/or hand drawing are fine. Schematic entries tools are often not worth the trouble. Do not show the detailed gate level diagram. You can refer to the schematic of datapath in this example.
For questions 3 and 4, answer with the style used in the following example. This is your trivialscalar IO operation (dont worry about details). It shows the style of answer we are looking for in questions 3 and 4. For the example refer to this.
|Example Question and Answer|
Question: What happens when 'in_req' is asserted? assume that IO device responds after 1 clock cycle delay and req and ack signals need to be asserted for one complete clock cycle.
Answer: Assume that cycles start from cycle 0. Consult this figure which shows all the below mentioned steps : figure.
Question 3: How does your processor handle the following situations? Draw paths on your datapath and give a written explanation on a cycle-by-cycle basis. Follow the explanation style of the example above.
Question 4: For each instruction in your ISA, draw a path on the datapath schematic which shows how the instruction is handled, and give a written explanation on a cycle-by-cycle basis. You can group instructions (e.g. all r-type instructions) if they share the same datapath. Use the answer style used in the example above.
Before you continue beyond this point, you should understand, in detail how every instruction in your ISA is going to work in the data path. You should have a drawing of your datapath with all the datapath elements and signals labeled and all control lines present. You should also reason carefully about how anomalous conditions will be handled (e.g., How will your processor block on 'in' and 'out' instructions? What will happen when memory requests are refused?). Ideally, from this point forward, there should only be implementing not designing.
To help you get to this point you must do two things:
Question 5: What questions did your reviewers ask? What bugs did they uncover? Include the names of the members of the other team.
Question 6: Provide one example of simulating a short instruction sequence.
Now that you have the datapath design, let's start implementing the datapath. You need to implement the major components such as a register file and a data memory, and datapath.v (the module to wire all your datapath elements together). You should use RTL Verilog for the implementation of leaf modules and structural Verilog for the implementation of all non-leaf modules, like datapath.v.
Your code will be graded on correctness, completeness, and how closely it follows the coding standards.
Question 7: Generate a data memory module using the Xilinx CORE Generator that is 34 bits wide and has 8k (2^13) entries. Call the module 'dmem_34_8k'. For the initialization of the memory module, use any coe file your assembler generated in the lab in cse141. Write a test-bench which tests whether your memory module is working correctly, and then actually test your memory module. Capture the behavioral simulation result screen and include it in your report with explanation. The captured screen should clearly show 1) read and write operations work correctly, and 2) memory is properly initialized with the provided *.coe file. The generated module should be used with the provided wrapper module dmem.v.
Question 8: Implement all the other leaf modules required in your datapath. You should implement each module in a separate .v file with RTL Verilog.
Question 9: Implement all non-leaf nodes, including datapath.v, in structural Verilog. Your entire datapath should be synthesizable and implementable.
||Due: February 14|