On this page:
1 The Anaconda Language
1.1 Concrete Syntax
1.2 Abstract Syntax
1.3 Semantics
2 Implementing a Compiler for Anaconda
2.1 Writing the Compiler
2.1.1 Assembly instructions
2.2 Running main
2.3 Testing the Compiler
Anaconda

In this assignment you’ll implement a compiler for a small language called Anaconda, that has let bindings and binary operators.

Deadline for this assignment is Thursday, April 20 at 11:59PM.

1 The Anaconda Language

In each of the next several assignments, we’ll introduce a language that we’ll implement. We’ll start small, and build up features incrementally. We’re starting with Anaconda, which has just a few features – defining variables, and primitive operations on numbers.

There are a few pieces that go into defining a language for us to compile:

1.1 Concrete Syntax

The concrete syntax of Anaconda is:

<expr> :=

  | <number>

  | <identifier>

  | let <bindings> in <expr>

  | add1(<expr>)

  | sub1(<expr>)

  | <expr> + <expr>

  | <expr> - <expr>

  | <expr> * <expr>

 

<bindings> :=

  | <identifier> = <expr>

  | <identifier> = <expr>, <bindings>

Here, a let expression can have one or more bindings.

1.2 Abstract Syntax

The abstract syntax of Anaconda is an OCaml datatype, and corresponds nearly one-to-one with the concrete syntax.

type prim1 =

  | Add1

  | Sub1

 

type prim2 =

  | Plus

  | Minus

  | Times

 

type expr =

  | Number of int

  | Id of string

  | Let of (string * expr) list * expr

  | Prim1 of prim1 * expr

  | Prim2 of prim2 * expr * expr

1.3 Semantics

An Anaconda program always evaluates to a single integer. Numbers evaluate to themselves (so a program just consisting of Number(5) should evaluate to the integer 5). Primitive expressions perform addition or subtraction by one on their argument. Binary operator expressions evaluate their arguments and combine them based on the operator. Let bindings should evaluate all the binding expressions to values one by one, and after each, store a mapping from the given name to the corresponding value in both (a) the rest of the bindings, and (b) the body of the let expression. Identifiers evaluate to whatever their current stored value is. There are several examples further down to make this concrete.

The compiler should signal an error if:

Here are some examples of Anaconda programs:

Concrete Syntax

 

Abstract Syntax

 

Result

5

 

Number(5)

 

5

sub1(add1(sub1(5)))

 

Prim1(Sub1, Prim1(Add1, Prim1(Sub1, Number(5))))

 

4

let x = 5 in add1(x)

 

Let([("x", Number(5))], Prim1(Add1, Id("x")))

 

6

sub1(5)

# as an expr

Prim1(Sub1, Number(5))

# evaluates to

4

let x = 10, y = 9 in

(x - y) * 2

# as an expr

Let([("x", Number(10)), ("y", Number(9))],

  Prim2(Times, Prim2(Minus, Id("x"), Id("y")), Number(2)))

# evaluates to

2

2 Implementing a Compiler for Anaconda

You’ve been given a starter codebase that has several pieces of infrastructure:

All of your edits—which will be to write the compiler for Anaconda, and test it—will happen in compile.ml and myTests.ml.

2.1 Writing the Compiler

The primary task of writing the Anaconda compiler is simple to state: take an instance of the expr datatype and turn it into a list of assembly instructions. The provided compiler skeleton is set up to do just this, broken up over a few functions.

The first is

compile : expr -> instruction list

which takes a expr value (abstract syntax) and turns it into a list of assembly instructions, represented by the instruction type. Use only the provided instruction types for this assignment; we will be gradually expanding this as the quarter progresses. This function has an associated helper that takes some extra arguments to track the variable environment and stack offset. These will be discussed in more detail in lecture.

Note: In lecture, we used a (string, int) avlnode as the environment type, here we use a (string * int) list. This is a simpler data structure that’s often called an association list. There is a provided find function that looks up a value (an int) by key (a string). Adding to an association list is trivial – simply add onto the front with ::. You are responsible for understanding how ordering in the case of duplicate keys may interact with scope.

The other component you need to implement is:

to_asm_string : instruction list -> string

which renders individual instances of the instruction datatype into a string representation of the instruction (this is done for you for mov and ret). This second step is straightforward, but forces you to understand the syntax of the assembly code you are generating. Most of the compiler concepts happen in the first step, that of generating assembly instructions from abstract syntax. Do use this assembly guide if you have questions about the concrete syntax of an instruction.

2.1.1 Assembly instructions

The assembly instructions that you will have to become familiar with for this assignment are:

2.2 Running main

The main program built with make main takes a single file as its command-line argument, and outputs the compiled assembly string on standard out. Note the .ana extension.

$ make main

$ ./main input/forty_two.ana

section .text

global our_code_starts_here

our_code_starts_here:

  mov eax, 42

  ret

To actually evaluate your assembly code, first we must create a .s assembly file, and then link it with main.c to create an executable.

$ make output/forty_two.s (create the assembly file)

$ make output/forty_two.run (create the executable)

Finally you can run the file by executing to see the evaluated output:

$ ./output/forty_two.run

2.3 Testing the Compiler

The test file has the helper function t that will be useful to you:

t : string -> string -> string -> OUnit.test

The first string given to t (test) is a test name, followed by an Anaconda program (in concrete syntax) to compile and evaluate, followed by a string for the expected output of the program (this will just be an integer in quotes). This helper compiles, links, and runs the given program, and if the compiler ends in error, it will report the error message as a string. This includes problems building at the assembler/linker level, as well as any explicit failwith statements in the compiler itself.

If your tests do not have any errors, a .s file and .run executable is generated in the output/ directory, containing the compiled assembly code and executable for that case.

You can test all the provided tests and the tests you have provided in myTests.ml by running

$ make test

$ ./test

This should report all tests that fail to compile or diverge from the specified result.

Debug/Protip: Check your assembly files as a means of debugging your code. If you can work through the assembly and identify incorrect assembly instructions, you can trace the problem back to your compiler! Furthermore, you can manually edit your .s to see what some assembly code may evaluate to.