Lab 2b: Assembler and Simulator
CSE 141L, Spring 2007, Donghwan Jeon
Due 5/14 (M) before the beginning of the class
You should work on this lab with your team from lab 2a.
Overview
Although you have already designed an ISA, you need to validate and revise the ISA before expending the extensive effort required to sign and implement it. In lab 2b, you will implement two essential components for architectural validation and verification- assembler and simulator. An assembler translates your program written in your ISA into binary code; a simulator performs a simulation upon a provided binary code. Collectively, they form essential infrastructure for your processor design.
For the implementation of the assembler and the simulator, you are free to choose any high level language (including scripting languages).
Assembler
It is difficult and error-prone to manually write machine code. To address this problem, people usually use an assembler, which automatically generates a machine code from an assembly file. Moreover, many C compilers first generate assembly files and then simply feed it to an assembler to get the executable machine code. For these reasons, you should write an assembler for your ISA. Your assembler reads a program written in an assembly language, then translates it into binary code and generates two output files containing executable machine code. You will use the generated output files for both the simulator and the actual hardware you will implement.
Required Features
First, examine the following sample assembly code.
.text
la $1, table0 // load the address of a label table0 (pseudo instruction)
lw $2, $1 // $2 <= 0x000C0FFEE
lw $3, table0 // load the value at label table0 (pseudo instruction), $3 <= 0x000C0FFEE
lw $4, 3(table0) // $4 <= table1
lw $5, 1($4) // $5 <= 0x1DEADBEEF
sw $5, 1($1) // 0x001COFFEE is overwritten to 0x1DEADBEEF
li $6, 0xC0FFEE // load immediate (pseudo instruction), $6 <= 0xC0FFEE
.data
table0:
.word 0x000C0FFEE
.word 0x001C0FFEE
.word 0x002C0FFEE
.word table1
table1:
.word 0x0DEADBEEF
.word 0x1DEADBEEF
.word 0x2DEADBEEF
.word 0x3DEADBEEF
.word 0x4DEADBEEF, 0x5DEADBEEF, 0x6DEADBEEF
.fill 10 0x0
Your assembler must support the following keywords and reserved words used in the example code above. The following list is the minimal required set; you may extend it with other keywords.
- .text
Indicates the start of the text section, consisting of instructions. .text can appear many times in an assembly file, but they must be merged into a single section in the output of the assembler. The input to the assembler (-text_addr) will specify the address that the assembler will assume the resulting text section is loaded at.
- .data
Indicates the start of the data section. Similar to .text, there can be multiple .data in an assembly file. The input to the assembler (-data_addr) will specify the address that the assembler will assume the resulting data section is loaded at.
- .word
Specifies the word data at a memory location. It can be followed by multiple one or more words to describe data for multiple words.
- .fill
Duplicates a data many times. For example, .fill 10 0x0 duplicates 0x0 ten times.
- label:
Represents an instruction or data memory address, similar to labels in other languages, can be used in instructions to specify a specific address (e.g. la $0, table0). When you use labels, you can minimize the amount of manual modifications in your assembly code upon insertions or deletions of instructions and/or data. For example, suppose that .word 0x003COFFEE is insterted after .word 0x002C0FFEE in the previous sample code; the only needed change is the fourth instruction from lw $4, 3(table0) to lw $4, 4(table). In other words, labels make the code maintenance cost more manageable by keeping the effect of a code modification be local to a label.
In addition to the keywords listed above, the assembler must support at least four pseudo instructions equivalent to the following MIPS pseudo instructions appeared in the previous example code.
- la $rd, offset(label)
Loads the address of (label + offset) into $rd.
- lw $rd, offset(label)
Loads the data located at (label + offset) into $rd.
- li $rd, immediate
Loads the immediate value into $rd.
- sw $rs, offset(label)
Stores the value of $rs to the address (label + offset).
These pseudo instructions will make programmers' life easier by allowing programmers not to concern absolute addresses; instead, they can describe most addresses with label and offset. Note that pseudo instructions might be equivalent to multiple instructions in your ISA. Also note that the addresses used in pseudo instructions can be either an instruction memory address or a data memory address depending on the location of the label. You may change pseudo instruction names if needed, but the semantics of instructions should be the same.
Code Stub
To execute a benchmark in your simulator, you will need to create a small code stub that sets up the environment, including:
- Stack Pointer
Make sure that the stack pointer is properly set so that the tested function (benchmark) can use the stack.
- Global Pointer
A program often accesses global data and/or constants by using the global pointer ($gp in MIPS), which is also set by the code stub. Like the stack pointer, a code stub should properly set the global pointer before the execution of a tested function.
- Input Data
Many of the benchmark programs require a few arguments. To test these benchmarks, a code stub must properly set registers and/or the stack memory depending on the way your ISA passes function arguments. You should modify your code stub when you want to test a benchmark with different arguments.
In addition to function arguments, SuperGarbage benchmark needs test data. We provide a simple data input for SuperGarbage to save your time for making a test bench; you have to include the provided test data in your code stub for SuperGarbage using previously mentioned .word keyword.
When all the environmental settings are properly done, the code stub should call a benchmark function. Since code stub should be the first code executed upon power-on, the text section of the code stub should be placed at 0x0 of the instruction memory in the output file of the assembler. (Assume that your processor starts by fetching an instruction from 0x0 upon reset.) Before you implement a general program loader, you have to use code stub; in lab 2b, always assemble with a code stub.
Input and Output Requirements
- Your assembler should accept the following input parameters:
- -stub [stub_name] : a single .s (assembly code stub) file, optionally used when the output files are intended for simulation purposes
- -src [$name] : a single .s (assembly test bench) file
- -text_addr [addr] : the start address of the text section in the instruction memory; do not need to specify when -stub is used
- -data_addr [addr] : the start address of the data section in the data memory
- Your assembler have two outputs - $name_i.coe and $name_d.coe. $name_i.coe corresponds to a 17-bit instruction memory while $name_d.coe corresponds to a 34-bit data memory. They must have the following forms:
- $name_i.coe - 17 bit word size
MEMORY_INITIALIZATION_RADIX=16;
MEMORY_INITIALIZATION_VECTOR=
00000,
00001,
00002,
00003,
00004,
00005,
.....,
1DEAD
EOF
- $name_d.coe - 34 bit word size
MEMORY_INITIALIZATION_RADIX=16;
MEMORY_INITIALIZATION_VECTOR=
000000000,
000000001,
000000002,
000000003,
000000004,
000000005,
.........,
3DEADBEEF
EOF
The first line in each output file specifies the numerical format used in the file. We use hexadecimal notation. Note that each entry of the 'MEMORY_INITIALIZATION_VECTOR' is data for each word, starting from address 0x0. Thus an entry of the instruction memory is 17-bit, while that of the data memory is 34-bit. These output files will be used as inputs to your simulator as well as coregen utility which generates SRAM modules in Xilinx ISE.
Since a *.coe file always start out at 0x0, you must fill out unused memory area from 0x0 to -text_addr or -data_addr with arbitrary values (typically 0x0). However, for other unused memory locations, you do not need to specify values; unspecified memory locations will have arbitrary values.
- Assembler example usages:
- prompt> asm -stub start.s -src test.s -data_addr 0x200
The assembler will generate two files - test_i.coe and test_d.coe. Since -stub option is used, the text section will be loaded at 0x0 of the instruction memory, while the data section will be located at 0x200 of the data memory. The test section starts with instructions in the code stub start.s, then it calls the benchmark function in test.s which is located after start.s in the instruction memory .
- prompt> asm -src garbage.s -text_addr 0x100 -data_addr 0x200
The assembler will generate two files - garbage_i.coe and garbage_d.coe, which will be loaded at 0x100 of the instruction memory and 0x200 of the data memory, respectively. Since no code stub is used, the function in garbage.s will be directly located at 0x100 of the instruction memory and 0x200 of the instruction memory, respectively.
Simulator
Now that you have an easy way to generate machine code, it is time to implement a simulator. With a simulator, you can 1) easily verify your ISA without actually implementing hardware, 2) debug your application without having actual hardware, and 3) improve your ISA by spotting performance bottlenecks in benchmark programs. Your simulator operates instruction by instruction; you must be able to execute instructions one by one and watch all the programmer visible states (eg. register, memory, ...) at a certain time when the execution of an instruction is completed. Assume that your simulator starts executing at address 0x0 upon power-on.
Simulator Commands
You may add whichever commands you find useful. However, your simulator must support the following commands:
- iload $i_file.coe $start_addr
       loads *.coe files at $start_addr of the instruction memory
- dload $d_file.coe $start_addr
       loads *.coe files at $start_addr of the data memory
- go $number
       simulates next $number instructions
- dump_reg
       prints values in all registers
- set_reg $reg_num $value
       sets the register $reg_num with the value $value
- dump_imem $addr $size
       disassembles instructions in the instruction memory from $addr to $addr + $size, which might not be exactly the same with the assembly instruction appeared in your assembly file
- set_imem $addr $value
       sets the value at $addr of the instruction memory with the value $value
- dump_dmem $addr $size
       prints data memory values from $addr to $addr + $size
- set_dmem $addr $value
       sets the value at $addr of the data memory with the value $value
- dump_channel $channel
       prints values put on channel $channel by out instruction
- clear_channel $channel
       discards all values written to channel $channel so that out instruction can proceed
- put_channel $channel $value
       puts $value on channel $channel so that in instruction can get a value
- set_buf_size $value
       changes the buffer size for each channel to $value. the initial size is 16
- instr_count
       shows the number of executed instructions
How to Test
in and out
Among provided benchmarks, SuperGarbage uses in and out instructions. To make these instructions work, you should use put_channel and dump_channel commands which emulate actual I/O operations of hardware. If an out instruction blocks due to the lack of buffer space, you should use clear_channel command to make space so that out instructions could succeed. You can set the buffer space for each channel by using set_buf_size command before starting a simulation.
Validation
After executing a benchmark program, you should check whether your program behaved as you expected. You can easily validate the simulation result by examining register and memory values. In specific, the return register should have an appropriate value when a Fibonacci simulation ends. Similarly, memory values should be properly updated when Program Loader or SuperGarbage are executed.
Deliverables
Your team needs to submit the following items:
- Hardcopy
- Updated ISA Manual
- Assembly files (*.s) for three benchmarks
- Code stub files (*.s) for three benchmarks
- E-mail to TA (zipped, titled "[CSE141L] Lab 2b, team name - member_name1, member_name2")
- Full source code of the assembler and the simulator
- *.s files (benchmarks, code stubs) for three benchmarks
- *.coe files for three benchmarks (total six coe files)
Lab Interviews
We have a 15 minute interview session for each team in this lab. Please go to team website to sign up for an interview slot.Each team is responsible for the following items:
- Prepare your demonstration by the interview time. You might use either your laptop or a basement lab computer.
- In the lab interview, you have to show that all the benchmarks are working correctly with your assembler and simulator. Please thoroughly test benchmarks with different inputs before the interview.
- You should be able to answer the following questions:
- What are main design decisions you made for your ISA?
- How does your ISA pass arguments at a function call?
- How does your ISA load big constants?
- What is the number of dynamic instructions to complete a specific benchmark?
Hints
- A sample code stub written in MIPS assembly
.text
li $gp, DATA_ADDR // set gp register,
// determined by what you pass
// to assembler for -data_addr
li $sp, STACK_ADDR // set sp register
// determined by physical memory size
mov $a0, 10 // argument 0 = 10
jal fib // call function 'fib'
end:
j end // while(1)
- If you use C to implement your assembler and simulator, you might find strtok() function useful for simple parsing work. Students who are familiar with Lex may find it useful for the parsing part of the assembler.
Java users may find StringTokenizer useful.
- TA provides a simple 34-bit datatype java class to save your time. Use as is, or modify it according to your requirements. See main() function to see a few usages of the class.