cse141L Lab 3: TrivialScalar - Control


Changelog

October 20

Modified the Control Module interface so that the inst_in input is OP_CODE_WIDTH = 4 bits wide instead of INST_WIDTH = 16 bits. This was changed so that it is consistent with the interface(output [3:0] inst_out) of the datapath module you designed in lab 2.

October 21

An extension has been given for lab3. It will be due on Friday (10/23) at 5pm. Drop the write-up in TA's mailbox (cse building, 2nd floor) and send your email by the deadline. Those that have already sent an email can send an updated version if they want to. Actually, the groups that will send their final version of lab3 today(Wed 10/21) by 7pm, will get extra credit for 141 that is equivalent to a micro-project.


Due: October 21

Overview

In Lab 3 you will extend the design demonstrated in Lab 2 to implement the control unit for TrivialScalar. As you probably realized throughout Lab 2, it is not very easy to test each individual module completely; however, with the addition of Lab 3 (the control unit), testing will be much more straightforward.

As in Lab 2, our aim for this lab is to help each of you to learn/master the skills necessary to succeed in the coming labs. Please use one of the many resources you have available to you (the TA, WebBoard, and your classmates) if you become stuck.

NOTE: You must complete Lab 3 in the same groups you completed Lab 2 with.

Please remember that you must conform to the class coding standards. They are available here. If you find a bug in them (e.g., they are causing you to do something horribly ugly), please let the professor or the TA know.

The greatest common divisor example is available here. Create a new project and import all the files. If you simulate using gdb_tbw.tbw, it will run a short test. This example demonstrates the coding standards for the datapath and the modules therein.

Getting Started

General Notes

IO in TrivialScalar

Here are some notes about how IO should work in TrivialScalar. You should read through this before starting the implementation later on. Please note that all of the signals shown below (in the diagrams) should be held for an entire cycle.

On a READ instruction, in_req signal should be asserted for an entire cycle (as shown in the following figure) and the processor should block for a response from the IO device. The IO device may take any number of cycles to respond to the request. However, the earliest that the IO device may respond would be the following cycle. The figure below demonstrates the in_req signal being asserted and the IO device responding in the third cycle following the initial request by placing data on the in_data line and asserting the in_ack line for an entire cycle. The data on the in_data line is only valid while the in_ack line is asserted.

WRITE instructions work similarly as the out_req signal is asserted while the data to be passed to the IO device is placed on the out_data bus (as is shown in the following figure). The IO device may respond any number of cycles later by asserting the out_ack wire. As in the case of a READ instruction, the earliest time for a response is the following cycle. The figure below demonstrates the protocol for communicating with the IO device. On a WRITE instruction, the out_req signal should be asserted while the data to be transferred to the IO device should be on the out_data bus. The processor must stall until the IO device has completed the request (which is signaled by a cycle-long assertion of out_ack).

TrivialScalar's Dmem/Imem

Here is some insight as to how the memory modules work in TrivialScalar. Some more contextual information is available in the table below.

Firstly, it is important to note that the dmem module in TrivialScalar is positive-edge triggered and that the imem module is asynchronous. What this means to you, as the architect, is that a read or a write in the dmem can only occur at a positive edge of the clock signal. All reads through the imem will still occur in a single cycle. The implications of this are that a read on a memory location (in the dmem) require an entire cycle to show up on its output bus. The direct impact of this observation is pointed out below and in the following section.

On an LD instruction, the following should take place: the dmem_read_write_req_out signal should be asserted, the proper input to r_data should be selected via the reg_sel_out signal, and the regfile_write_en_out signal should be asserted at the proper time. Know that an access to the dmem takes about a cycle, so the LD operation takes a total of two cycles. This is part of the state machine design detailed in the next section, and is reiterated and emphasized throughout that section.

On a ST instruction, the protocol is more straight-forward. The following signals have specific importance during this instruction: both dmem_read_write_req and dmem_write_en_out should be asserted in order for the dmem to latch onto its inputs at st_data and addr. Although this operation takes time, just as the LD operation does, since we've eliminated the extra propagation delay between dmem and the register file, we don't have to continue for an extra cycle. The value should have been stored into the dmem by the following cycle.

Control Implementation

The main focus of this lab is creating a single module: control. We'll go ahead and call it control.v.

Control

The interface listed below lists each signal in the order it should be encountered while looking through the PDF schematic:
// Parameters:
//  OP_CODE_WIDTH -> Width of an instruction's opcode
module control#(parameter OP_CODE_WIDTH = 4)
(
	// Global clock
	input clk,
	// Global reset
	input reset,

	// Control-Datapath interface
	//  Specifies which PC should be used next cycle
	output [1:0] run_stall_reset_sel_out,
	//  The entirety of the current instruction from the Decode unit
	input [OP_CODE_WIDTH - 1 : 0] inst_in,
	//  Write enable for the dmem
	output dmem_write_en_out,
	//  Request for a read or write from the dmem
	output dmem_read_write_req_out,
	//  Selects the data to be written to the register file
	output [1:0] reg_sel_out,
	//  Write enable for the register file
	output regfile_write_en_out,
	//  Specifies the op code for the ALU unit
	output [1:0] alu_op_code_out,

	// The control interface for IO
	// More details to follow
	input in_ack,
	input out_ack,
	output out_req,
	output in_req
);
			

We'll be designing the control unit as a state machine. If this is unfamiliar to you, consult Verilog Design Examples by Krste Asanovic. If the concept of state machines is still foreign, please talk to the TA.

The above image shows the different states that the control module (and therefore the processor) can have. The below table should explain the different states:

State Meaning
RUN This state is representative of the normal processor execution, where each PC should advance by one each cycle. In this state, the processor is not looking for any particular signals from the outside world. Be aware that this state will have to perform some preparation when moving from one state to the next (like stalling the processor for a cycle).
DMEM Read Although we might like for a load from the dmem to take just a single cycle, it unfortunately must take longer. The dmem takes about a cycle to fetch the data for a given input, which doesn't leave enough time for the fetched data to propagate to the register file. Knowing this, we must wait an extra cycle on each LD instruction in order to account for dmem latency and general propagation delay. The processor will effectively be stalled here, as in the other states, however, we already know that it must resume the following cycle (so we aren't waiting on a response). The challenge in this case will be asserting the proper signals at the correct time in order to avoid latching on to the wrong data.
IO Write This state is in effect when the control module encounters a WRITE instruction. The processor should be completely stalled while in this state, waiting for a cycle-long out_ack (but not in_ack!). During the first cycle of the WRITE instruction, the control module should (as demonstrated in the IO section above) assert the out_req output to "pass" the value on out_data to the IO device (this should happen in preparation for this state--see the note in the RUN state). Note that the datapath handles the placement of the data in R1 on the out_data bus.
IO Read This state is similar to the IO Write state except that it stalls waiting for a cycle-long in_ack, and it asserts the in_req output on a READ instruction in order to request for data to be placed on the in_data bus. The control module must also route the proper data (in_data) to the Register File. (As in the IO Write case, be aware of the note about performing preparation in the RUN entry above).
HALT This state just signifies when the processor has seen a HALT instruction. The processor should stall indefinitely from this point on, until it is reset.

It is important to note the many different ways that this state machine can be designed. For more information on our requirements for the design of this state machine, consult the document from last lab: Verilog Design Examples by Krste Asanovic. What I'm referencing here is the state machine design detailed throughout the above PDF. Your control module should be similar in concept to the GCD state machine (expect somewhat more complicated). Don't forget to use only three always blocks, initialize your signals on every cycle to avoid repeated statements, and to comply with the class Verilog coding standards throughout the design process.

Below are some tips for the generation of each signal. I won't give away much information, but just enough to get you started.

run_stall_reset_sel_out

The run_stall_reset_sel_out signal will be choosing which of the three signals will propagate to become the Next PC. This signal might be particularly tricky because the diagram isn't entirely specific of what inputs to the mux generate which outputs. To be more specific, the ordering of the signals entering the PC MUX shouldn't matter in terms of design rules. If you or your group feels that you can generate a more efficient control implementation by switching around which signal enters which terminal of the MUX, please go ahead and do so. Your implementation must continue to adhere to the other aspects of the specified design. Be aware that this signal (among others) depends greatly on instruction being executed and the next_state the control unit will enter into.

On another note, be aware that your reset signals should be asynchronous throughout the "stateful" parts of your processor. This allows for your always blocks to trigger on the edge of a reset signal instead of relying on your reset signal to stay high until a positive clock edge is generated.

Question 1: For each processor state, as shown in the state machine above, provide the values of each output signal of the control module. To do that, make a table with rows the 8 output signals of the control module, and columns the 5 states explained earlier. For some states the output signals depend on the instruction being executed or the IO signals. Use the ? : conditional operator to make the table more concise. Also use X (don't care) for the signals when their value is not important.

dmem_write_en_out

This signal should just be asserted when it is time to write to the dmem.

dmem_read_write_req_out

This signal should just specify when there is either a read or a write occurring at the dmem.

reg_sel_out

This signal is similar to the read_stall_reset signal in that it chooses from multiple inputs through the use of a MUX. In this case, it is also recommended that you choose an optimal ordering for the MUX inputs that simplifies your Control code.

regfile_write_en_out

This is just the write enable for the register file. Beware that for one particular instruction, when you assert this signal will be crucial for proper operation of the instruction/processor.

Question 2: Which instruction is referenced at the end of the last paragraph and why would we not want to assert it immediately? Assume that the state of your processor should remain valid at all times.

alu_op_code_out

This signal should just specify which operation we would like to execute in the ALU.

in_ack, out_ack, in_req, and out_req

Please refer to the IO explanation above for the operation of these signals.

TrivialScalar High-level Implementation

Now that we have all the individual pieces designed and functioning, we can go ahead and put together our processor. We'd like to make a high-level file called processor.v and instantiate and connect our two large modules together. Here's the interface for that:

module processor
(
	// Global clock
	input clk,
	// Global reset
	input reset,

	// The data interface for IO
	input [7:0] in_data,
	output [7:0] out_data,

	// The control interface for IO
	input in_ack,
	input out_ack,
	output out_req,
	output in_req
);
			

Test Benches

Using Test Benches and *.coe files

For starters, go ahead and test your TrivialScalar implementation with the provided test program and testbench.

Here's a link to a test program in (*.coe) format that you can use to verify your processor implementation. This should just be attached to your imem module, just as the memory generation tutorial specifies. You can change the properties of a generated memory by double clicking on the module (imem in this case) and bringing up the Distributed Memory Generator. On page 3 of 3, you'll have the option of changing the Coefficients file (*.coe file). This is where you'll want to specify this new file.

Question 3: Provide a translation of the provided *.coe file into assembly.

In addition to this file, you'll need to stimulate the inputs specific to this *.coe file (speaking in terms of the IO module). Here's a *.zip which contains the test bench files that you can use in conjunction with the above test.coe file. Add and copy the .tbw file to your project after your extract the whole folder.

Question 4: How many cycles after the first IO-based instruction does the IO device respond. Please provide a screen-shot of a waveform describing this.

Generating/Writing Test Benches

After using the provided *.coe file and test bench above, we'd like you to generate your own set of testing tools. We would like each group to write up a test program that takes two inputs from the IO module, operates on them, and then sends them back down the IO path. Assuming the two inputs are X and Y (received in that order from the IO unit) function we'd like you to perform is ((X + Y) * Y - 3). We'd like for you to do this two times in your code (remember that this ISA doesn't have a branch--so no loops!), writing back your result to the IO device after the computation of each function. Please generate your own test bench that provides your processor with the proper IO responses. I would recommend variation in the delay of the IO responses so that each of the three iterations have some differences in timing, as this is most likely how we'll be testing this.

Note that this set of test bench will test your IO instructions, but you'll have to verify visually that your protocol matches the one listed in this write up (since this test bench cannot examine and report on the protocol itself).

Question 5: As described in the paragraph above, create your own *.coe file with the above parameters and test bench waveform (*.tbw--and its counterpart *.tfw file) file to accompany it. Feel free to use a more loosely timed clock for this portion. Provide the contents of the *.coe file and the assembly you derived it from, as well as a screen-shot of the simulation with your testing.

Performance Evaluation

Now that your processor should be completely working (and these results will really only be valid in such a case), you should go ahead and perform some performance-based analysis on your design. You should use the static timing report generation tool to accomplish this. You will need to change a few settings in order to get the results that are required of you:

General Tips

Deliverable

  • Submit your report for the questions above to the TA via e-mail by the due date before the beginning of the class.
    • Answer all of the questions (6) found in the lab description.
    • The report should be in a single PDF file (including answers to questions, verilog source code, graphs, screen-shots, etc). There are many tools out there capable of integrating text and graphics and producing PDF files (OpenOffice does a pretty good job).
    • Please include your zipped project directory in your email. Your project directory should include all of your project files, including all of the test bench and *.coe files mentioned in this lab. (Gmail and other mail providers might refuse to mail files containing executables. If this is the case, rename the file from *.zip to *.zipped and it will go through.) Please use the following naming convention: cse141L-lab1-LastName1-FirstName1-LastName2-FirstName2-LastName3-FirstName3.zipped with your group members' last names and first names substituted for LastName1-3 and FirstName1-3, respectively.
    • Name your PDF file cse141L-lab1-LastName1-FirstName1-LastName2-FirstName2-LastName3-FirstName3.pdf with your group members' last names and first names substituted for LastName1-3 and FirstName1-3, respectively.
    • The subject line of your email should read "[CSE141L] Lab 3 Submission - LastName1, FirstName1 - LastName2, FirstName2 - LastName3, FirstName3".

Due: October 21