In this assignment you’ll implement a compiler for a small language called Anaconda, that has let bindings and binary operators.
Deadline for this assignment is Thursday, April 20 at 11:59PM.
In each of the next several assignments, we’ll introduce a language that we’ll implement. We’ll start small, and build up features incrementally. We’re starting with Anaconda, which has just a few features – defining variables, and primitive operations on numbers.
There are a few pieces that go into defining a language for us to compile:
A description of the concrete syntax – the text the programmer writes.
A description of the abstract syntax – how to express what the programmer wrote in a data structure our compiler uses.
The semantics—or description of the behavior—of the abstract syntax, so our compiler knows what the code it generates should do.
The concrete syntax of Anaconda is:
| let <bindings> in <expr>
| <expr> + <expr>
| <expr> - <expr>
| <expr> * <expr>
| <identifier> = <expr>
| <identifier> = <expr>, <bindings>
Here, a let expression can have one or more bindings.
The abstract syntax of Anaconda is an OCaml datatype, and corresponds nearly one-to-one with the concrete syntax.
type prim1 =
type prim2 =
type expr =
| Number of int
| Id of string
| Let of (string * expr) list * expr
| Prim1 of prim1 * expr
| Prim2 of prim2 * expr * expr
An Anaconda program always evaluates to a single integer. Numbers evaluate to themselves (so a program just consisting of Number(5) should evaluate to the integer 5). Primitive expressions perform addition or subtraction by one on their argument. Binary operator expressions evaluate their arguments and combine them based on the operator. Let bindings should evaluate all the binding expressions to values one by one, and after each, store a mapping from the given name to the corresponding value in both (a) the rest of the bindings, and (b) the body of the let expression. Identifiers evaluate to whatever their current stored value is. There are several examples further down to make this concrete.
The compiler should signal an error if:
There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)
Here are some examples of Anaconda programs:
Prim1(Sub1, Prim1(Add1, Prim1(Sub1, Number(5))))
let x = 5 in add1(x)
Let([("x", Number(5))], Prim1(Add1, Id("x")))
# as an expr
# evaluates to
let x = 10, y = 9 in
(x - y) * 2
# as an expr
Let([("x", Number(10)), ("y", Number(9))],
Prim2(Times, Prim2(Minus, Id("x"), Id("y")), Number(2)))
# evaluates to
You’ve been given a starter codebase that has several pieces of infrastructure:
A parser for Anaconda (parser.mly and lexer.mll), which takes concrete syntax (text files) and turns it into instances of the expr datatype. You don’t need to edit this.
A main program (main.ml) that uses the parser and compiler to produce assembly code from an input Anaconda text file. You don’t need to edit this.
A Makefile that builds main.ml, builds a tester for Anaconda (test.ml), and manipulates assembly programs created by the Anaconda compiler. You don’t need to edit the Makefile or test.ml, but you will edit myTests.ml. Specifically, you will add your own tests by filling in myTestList following the instructions in the beginning of the file.
You need to add at least 5 tests. These will be automatically evaluated for thoroughness and correctness by checking if they catch bad implementations.
An OCaml program (runner.ml) that works in concert with the Makefile to allow you to compile and run an Anaconda program from within OCaml, which is quite useful for testing. You don’t need to edit runner.ml.
All of your edits—which will be to write the compiler for Anaconda, and test it—will happen in compile.ml and myTests.ml.
The primary task of writing the Anaconda compiler is simple to state: take an instance of the expr datatype and turn it into a list of assembly instructions. The provided compiler skeleton is set up to do just this, broken up over a few functions.
The first is
compile : expr -> instruction list
which takes a expr value (abstract syntax) and turns it into a list of assembly instructions, represented by the instruction type. Use only the provided instruction types for this assignment; we will be gradually expanding this as the quarter progresses. This function has an associated helper that takes some extra arguments to track the variable environment and stack offset. These will be discussed in more detail in lecture.
Note: In lecture, we used a (string, int) avlnode as the environment type, here we use a (string * int) list. This is a simpler data structure that’s often called an association list. There is a provided find function that looks up a value (an int) by key (a string). Adding to an association list is trivial – simply add onto the front with ::. You are responsible for understanding how ordering in the case of duplicate keys may interact with scope.
The other component you need to implement is:
to_asm_string : instruction list -> string
which renders individual instances of the instruction datatype into a string representation of the instruction (this is done for you for mov and ret). This second step is straightforward, but forces you to understand the syntax of the assembly code you are generating. Most of the compiler concepts happen in the first step, that of generating assembly instructions from abstract syntax. Do use this assembly guide if you have questions about the concrete syntax of an instruction.
The assembly instructions that you will have to become familiar with for this assignment are:
IMov of arg * arg — Copies the right operand (source) into the left operand (destination). The source can be an immediate argument, a register or a memory location, whereas the destination can be a register or a memory location.Examples:
mov eax, ebx
mov [eax], 4
IAdd of arg * arg — Add the two operands, storing the result in its first operand.
Example: add eax, 10
ISub of arg * arg — Store in the value of its first operand the result of subtracting the value of its second operand from the value of its first operand.
Example: sub eax, 216
IMul of arg * arg — Multiply the left argument by the right argument, and store in the left argument (typically the left argument is eax for us)
Example: imul eax, 4
The main program built with make main takes a single file as its command-line argument, and outputs the compiled assembly string on standard out. Note the .ana extension.
$ make main
$ ./main input/forty_two.ana
mov eax, 42
$ make output/forty_two.s (create the assembly file)
$ make output/forty_two.run (create the executable)
The test file has the helper function t that will be useful to you:
t : string -> string -> string -> OUnit.test
If your tests do not have any errors, a .s file and .run executable is generated in the output/ directory, containing the compiled assembly code and executable for that case.
$ make test
Debug/Protip: Check your assembly files as a means of debugging your code. If you can work through the assembly and identify incorrect assembly instructions, you can trace the problem back to your compiler! Furthermore, you can manually edit your .s to see what some assembly code may evaluate to.