Please use 13 bits wide PC in your code.
Sample solution for TrivialScalar is here. Please take a look. Note than this is just a sample TrivialScalar implementation, please dont think that this is the only correct way to do it. If you see a bug in any of the files then please mail the TAs and let them know.
Due: January 31
In Lab 3 you will extend the design demonstrated in Lab 2 to implement the control unit for TrivialScalar. As you probably realized throughout Lab 2, it is not very easy to test each individual module completely; however, with the addition of Lab 3 (the control unit), testing will be much more straightforward.
As in Lab 2, our aim for this lab is to help each of you to learn/master the skills necessary to succeed in the coming labs. Please use one of the many resources you have available to you (the TA, WebBoard, and your classmates) if you are stuck.
NOTE: You must complete Lab 3 in the same groups you completed Lab 2 with.
Please remember that you must conform to the class coding standards. They are available here. If you find a bug in them (e.g., they are causing you to do something horribly ugly), please let the professor or the TA know.
The greatest common divisor example is available here. Create a new project and import all the files. If you simulate using gdb_tb.v, it will run a short test. This example demonstrates the coding standards for the datapath and the modules therein.
Here are some notes about how IO should work in TrivialScalar. You should read through this before starting the implementation later on. Please note that all of the signals shown below (in the diagrams) should be held for an entire cycle.
READ instruction, in_req signal should be asserted for an entire cycle (as shown in the following figure) and the processor
should block for a response from the IO device. The IO device may take any number of cycles to respond to the request. However,
the earliest that the IO device may respond would be the following cycle. The figure below demonstrates the
signal being asserted and the IO device responding in the third cycle following the initial request by placing data on the
in_data line and asserting the
in_ack line for an entire cycle. The data on the
line is only valid while the
in_ack line is asserted.
WRITE instructions work similarly as the
out_req signal is asserted while the data to be passed to the IO device is
placed on the
out_data bus (as is shown in the following figure). The IO device may respond any number of cycles later
by asserting the
out_ack wire. As in the case of a
READ instruction, the earliest time for a response is the following
cycle. The figure below demonstrates the protocol for communicating with the IO device. On a
WRITE instruction, the
should be asserted while the data to be transferred to the IO device should be on the
out_data bus. The processor must
stall until the IO device has completed the request (which is signaled by a cycle-long assertion of
Here is some insight as to how the memory modules work in TrivialScalar. Some more contextual information is available in the table below.
Firstly, it is important to note that the
dmem module in TrivialScalar is positive-edge triggered and that the
module is asynchronous. What this means to you, as the architect, is that a read or a write in the
dmem can only occur at a
positive edge of the clock signal. All reads through the
imem will still occur in a single cycle. The implications
of this are that a read on a memory location (in the
dmem) require an entire cycle to show up on its output bus. The direct
impact of this observation is pointed out below and in the following section.
LD instruction, the following should take place: the
dmem_read_write_req_out signal should be asserted, the proper input
r_data should be selected via the
reg_sel_out signal, and the
regfile_write_en_out signal should be asserted
at the proper time. Know that an access to the
dmem takes about a cycle, so the
LD operation takes a total of two
cycles. This is part of the state machine design detailed in the next section, and is reiterated and emphasized throughout that section.
ST instruction, the protocol is more straight-forward. The following signals have specific importance during this
dmem_write_en_out should be asserted in order for the
dmem to latch
onto its inputs at
addr. Although this operation takes time, just as the
LD operation does,
since we've eliminated the extra propagation delay between
dmem and the
register file, we don't have to continue
for an extra cycle. The value should have been stored into the
dmem by the following cycle.
The main focus of this lab is creating a single module: control. We'll go ahead and call it control.v.
// Parameters: // OP_CODE_WIDTH -> Width of an instruction's opcode module control#(parameter OP_CODE_WIDTH = 4) ( // Global clock input clk, // Global reset input reset, // Control-Datapath interface // Specifies which PC should be used next cycle output [1:0] run_stall_reset_sel_out, // The entirety of the current instruction from the Decode unit input [OP_CODE_WIDTH - 1 : 0] inst_in, // Write enable for the dmem output dmem_write_en_out, // Request for a read or write from the dmem output dmem_read_write_req_out, // Selects the data to be written to the register file output [1:0] reg_sel_out, // Write enable for the register file output regfile_write_en_out, // Specifies the op code for the ALU unit output [1:0] alu_op_code_out, // The control interface for IO // More details to follow input in_ack, input out_ack, output out_req, output in_req );
We'll be designing the control unit as a state machine. If this is unfamiliar to you, consult State Machine Tutorial. If the concept of state machines is still foreign, please talk to the TA.
The above image shows the different states that the control module (and therefore the processor) can have. The below table should explain the different states:
||This state is representative of the normal processor execution, where each PC should advance by one each cycle. In this state, the processor is not looking for any particular signals from the outside world. Be aware that this state will have to perform some preparation when moving from one state to the next (like stalling the processor for a cycle).|
Although we might like for a load from the
This state is in effect when the control module encounters a
This state is similar to the
This state just signifies when the processor has seen a
It is important to note the many different ways that this state machine can be designed. For more information on our requirements
for the design of this state machine, consult the document listed above:
Verilog State machines.
What I'm referencing here is the state machine design detailed throughout the above PDF. Your control module should be similar
in concept to the GCD state machine (expect somewhat more complicated). Don't forget to use only three
initialize your signals on every cycle to avoid repeated statements, and to comply with the class Verilog coding standards throughout
the design process.
Below are some tips for the generation of each signal. I won't give away much information, but just enough to get you started.
run_stall_reset_sel_out signal will be choosing which of the three signals will propagate to become the Next PC.
This signal might be particularly tricky because the diagram isn't entirely specific of what inputs to the mux generate which outputs.
To be more specific, the ordering of the signals entering the PC MUX shouldn't matter in terms of design rules. If you or your group
feels that you can generate a more efficient control implementation by switching around which signal enters which terminal of the MUX,
please go ahead and do so. Your implementation must continue to adhere to the other aspects of the specified design. Be aware that
this signal (among others) depends greatly on instruction being executed and the
next_state the control unit will enter into.
On another note, be aware that your reset signals should be asynchronous throughout the "stateful" parts of your processor. This
allows for your
always blocks to trigger on the edge of a reset signal instead of relying on your reset signal to stay
high until a positive clock edge is generated.
Question 1: For each processor state, as shown in the state machine above, provide the values of each output signal of the control module. To do that, make a table with rows the 8 output signals of the control module, and columns the 5 states explained earlier. For some states the output signals depend on the instruction being executed or the IO signals. Use the ? : conditional operator to make the table more concise. Also use X (don't care) for the signals when their value is not important.
This signal should just be asserted when it is time to write to the
This signal should just specify when there is either a read or a write occurring at the
This signal is similar to the
read_stall_reset signal in that it chooses from multiple inputs through the use of a MUX.
In this case, it is also recommended that you choose an optimal ordering for the MUX inputs that simplifies your Control code.
This is just the write enable for the register file. Beware that for one particular instruction, when you assert this signal will be crucial for proper operation of the instruction/processor.
Question 2: Which instruction is referenced at the end of the last paragraph and why would we not want to assert it immediately? Assume that the state of your processor should remain valid at all times.
This signal should just specify which operation we would like to execute in the ALU.
Please refer to the IO explanation above for the operation of these signals.
Now that we have all the individual pieces designed and functioning, we can go ahead and put together our processor. We'd like to make a high-level file called processor.v and instantiate and connect our two large modules together. Here's the interface for that:
module processor ( // Global clock input clk, // Global reset input reset, // The data interface for IO input [7:0] in_data, output [7:0] out_data, // The control interface for IO input in_ack, input out_ack, output out_req, output in_req );
For starters, go ahead and test your TrivialScalar implementation with the provided test program and testbench.
Here's a link to a test program in (*.coe) format that you can use to verify
your processor implementation. This should just be attached to your
imem module, just as the
memory generation tutorial specifies. You can change the properties of a generated memory by double clicking
on the module (
imem in this case) and bringing up the Distributed Memory Generator. On
page 3 of 3, you'll have the option of changing the Coefficients file (*.coe file). This is where you'll want
to specify this new file.
Question 3: Provide a translation of the provided *.coe file into assembly.
In addition to this file, you'll need to stimulate the inputs specific to this *.coe file (speaking in terms of the IO module). Here's a testbench. Use this file (and modify) to answer below question.
After using the provided *.coe file and test bench above, we'd like you to generate your own set of testing tools.
We would like each group to write up a test program that takes two inputs from the IO module, operates on them,
and then sends them back down the IO path. Assuming the two inputs are X and Y (received in that order from the IO
unit) function we'd like you to perform is
((X + Y) * Y - 3). We'd like for you to do this two times
in your code (remember that this ISA doesn't have a branch--so no loops!), writing back your result to the IO device
after the computation of each function. Please generate your own test bench that provides your processor with the proper IO responses.
I would recommend variation in the delay of the IO responses so that each of the three iterations have some differences
in timing, as this is most likely how we'll be testing this.
Note that this set of test bench will test your IO instructions, but you'll have to verify visually that your protocol matches the one listed in this write up (since this test bench cannot examine and report on the protocol itself).
Question 5: As described in the paragraph above, create your own *.coe file with the above parameters and test bench file to accompany it. Feel free to use a more loosely timed clock for this portion. Provide the contents of the *.coe file and the assembly you derived it from, as well as a screen-shot of the simulation with your testing.
Now that your processor should be completely working (and these results will really only be valid in such a case), you should go ahead and perform some performance-based analysis on your design. You should use the static timing report generation tool to accomplish this. You will need to change a few settings in order to get the results that are required of you:
Question 6: Go ahead and implement your design and generate the above mentioned timing report. What is the maximum achievable frequency of your design? (Hint: The critical path is the path with the longest delay. Shortening this path increases your maximum frequency.) Describe your critical path through the circuit (e.g. PC_Mux->PC_DFF->etc...).
localparams to represent different states and typical inputs/outputs, so that your code isn't inundated with 4'b000, 4'b0001, etc. with no specific meaning tied in with those values (this is similar to using enums or #defines in C/C++).
|Due: January 31|