On this page:
1 Setup
1.1 Other Platform Instructions
1.2 Github
2 Programming in OCaml — The Basics
2.1 The Simplest Program
2.2 Defining and Calling Functions
2.3 Recursive Functions
2.4 Testing with OUnit
2.5 Exercises
3 Programming in OCaml — Datatypes
3.1 Binary Trees with type
3.2 Manipulating Data with match
3.3 Exercises
4 Programming in OCaml — Lists and Parametric Polymorphism
4.1 Linked Lists, By Hand
4.2 Linked Lists, Built-in
4.3 Exercises
OCaml Basics

We will use three programming languages in this course: OCaml, C (not C++), and x86 assembly. You should all have some background in C and x86, and the C we use won’t be surprising. Most of you should have some background in programming in a functional style, which is predominantly how we’ll use OCaml.

This writeup serves as both a reference for some of the OCaml topics we’ll need in the course, and your first assignment. You should do all the numbered Exercises throughout the document for your first warmup assignment, due Monday, April 10 by 11:59PM.

1 Setup

OPAM is a package manager for OCaml. If you know of pip or easy_install for Python, it’s the same idea.

OCaml is installed on the ACMS machines (you can go to the ETS page to get your account. You need to do a small bit of setup to use some libraries we’ll need for the course. Open up the file .bashrc in your home directory (with e.g. vim ~/.bashrc), and add these lines at the bottom:

eval `opam config env --root=/home/linux/ieng6/cs131s/public/`

. /home/linux/ieng6/cs131s/public/opam-init/init.sh

Then run:

$ source ~/.bashrc

Also append the following to your .bash_profile file as well:

if [ -f ~/.bashrc ]; then

    source ~/.bashrc

fi

This makes it so each time you open a terminal, the build commands you run for OCaml will be able to use some libraries we need for the course.

1.1 Other Platform Instructions

If you want to work on your own machine, you can install OCaml and OPAM via their platform-specific instructions. Everything in the course should be able to run on Windows and OSX as long as the machine has the x86 architecture.

1.2 Github

Assignments this semester will be distributed and submitted through Github Classroom. This first assignment is to be done independently. This Piazza post has link to get a Github clone of the assignment to work with (you will sign up to Github Classroom if you haven’t already):

https://piazza.com/class/j0smbcm5t3l4og?cid=13

That will give you the starter code you need.

2 Programming in OCaml — The Basics

This section covers some basics first, and then gives a structured introduction to how we’ll program in OCaml in this course.

2.1 The Simplest Program

OCaml files are written with a .ml extension. An OCaml file is somewhat similar to a Python file: when run, it evaluates the file directly (unlike, for example, C++, which designates a special main function). An OCaml file consists of

For example:

open Printf

 

let message = "Hello world";;

(printf "%s\n" message)

The first line includes the built-in library for printing, which provides functions similar to fprintf and printf from stdlib in C. The next two lines define a constant named message, and then call the printf function with a format string (where %s means “format as string”), and the constant message we defined on the line before.

Put this in a file called hello.ml and run it with:

$ ocaml hello.ml

Hello world

Most of our programs will be more complex than this, and much of the infrastructure for the “main” function will be provided, but it’s useful to see the simplest case first.

2.2 Defining and Calling Functions

One thing that we’ll do over and over again is define functions. Here’s an example of a function definition in OCaml:

let max (n : int) (m : int) : int =

  if n > m then n else m;;

It uses the keyword let followed by the name of the function (here, max). Next comes the parameter list. Each parameter is enclosed in parentheses, and has the parameter’s name, then a colon, then its type. So n and m are the parameters of max, and they are both type int. The final colon followed by int describes the return type of the function. Then there is an = sign, followed by the function body.

This declaration is similar to the following in C++:

int max(int n, int m) {

  if(n > m) { return n; }

  else { return m; }

}

One notable difference is that the OCaml function does not have any return statements. We’ll talk about how to think about the “return” value of a function without return statements next. It’s also important to note that the declaration in OCaml ends in ;;. This is a required syntactic convention for all top-level declarations in OCaml.

We’ll get to a more robust notion of testing in a little bit. We can check that max works by defining a useful top-level call with print statements:

open Printf

 

let max (n : int) (m : int) : int =

  if n > m then n else m;;

 

(printf "Should be 5: %d\n" (max 5 4));

(printf "Should be 4: %d\n" (max 3 4));

(printf "Should be 4: %d\n" (max 4 4));

You can copy this program into a file called max.ml and run it with ocaml max.ml.

There are a few things to explain here. First, the syntax for function calls in OCaml is different than you may be used to. Instead of writing

some_function(arg1, arg2, ...)

 

// for example

 

max(4, 5)

The surrounding parentheses can be omitted in some cases, but it’s always safe to include them to be clear. as we would in C++ or Python, in OCaml we write

(some_function arg1 arg2 ...)

 

// for example

 

(max 4 5)

This is just a useful model; everything you know about stacks and memory diagrams is still true (and in fact, we’ll talk about stacks in quite a bit of detail this semester). But substitution is a very helpful model for reasoning in the style of programming we’ll do. That’s just a syntactic difference. There’s also a useful distinction in how I prefer to think about what happens when we call a function in OCaml. Rather than thinking about a call to max creating a new stack frame, let’s think about what happens if we substitute the provided argument values for the parameters in the body of max, and continue by evaluating the function body.

So, for example, the call to max below takes a step to the substituted form:

   (max 4 5)

 

=> if 4 > 5 then 4 else 5

Then we can think about how the if expression takes steps. First, it evaluates the conditional part, and based on that value being true or false, it evaluates one or the other branch:

   if 4 > 5 then 4 else 5

 

=> if false then 4 else 5

 

=> 5

From this sequence of steps, we say that (max 4 5) evaluates to 5. This gives us a way to think about evaluation that doesn’t require a notion of a return statement.

With this idea of substitution in mind, we can think about how the sequence of printf expressions we wrote will evaluate:

  (printf "Should be 5: %d\n" (max 5 4));

  (printf "Should be 4: %d\n" (max 3 4));

  (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 5: %d\n" (if 5 > 4 then 5 else 4));

   (printf "Should be 4: %d\n" (max 3 4));

   (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 5: %d\n" (if true then 5 else 4));

   (printf "Should be 4: %d\n" (max 3 4));

   (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 5: %d\n" 5);

   (printf "Should be 4: %d\n" (max 3 4));

   (printf "Should be 4: %d\n" (max 4 4));

The rule for semicolon-separated sequences is that they are evaluated in order, and the value resulting from each expression is ignored once it is done.

=> <internal-to-OCaml-printing of "Should be 5: 5\n">;

   (printf "Should be 4: %d\n" (max 3 4));

   (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 4: %d\n" (max 3 4));

   (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 4: %d\n" (if 3 > 4 then 3 else 4));

   (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 4: %d\n" (if false then 3 else 4));

   (printf "Should be 4: %d\n" (max 4 4));

 

=> (printf "Should be 4: %d\n" 4);

   (printf "Should be 4: %d\n" (max 4 4));

 

=> <internal-to-OCaml-printing of "Should be 4: 4\n">;

   (printf "Should be 4: %d\n" (max 4 4));

 

... and so on

2.3 Recursive Functions

This is because, as we’ll see, there are lots of trees to process in a compiler, and trees have a fundamentally recursive structure. A lot of the code we write this semester will be recursive functions. OCaml distinguishes between functions that can contain recursive calls and functions that cannot. We saw the latter kind above in max which simply used the let keyword. We can define a recursive function by using let rec:

let rec factorial (n : int) : int =

  if n <= 1 then 1

  else n * (factorial (n - 1));;

The substitution-based rules are a little more interesting when thinking about evaluating a call to factorial:

  (factorial 3)

 

=> (if 3 <= 1 then 1 else 3 * (factorial (3 - 1)))

 

=> (if false then 1 else 3 * (factorial (3 - 1)))

 

=> (3 * (factorial (3 - 1)))

 

=> (3 * (factorial 2))

 

=> (3 * (if 2 <= 1 then 1 else 2 * (factorial (2 - 1)))

 

...

 

=> (3 * (2 * (factorial (2 - 1))))

 

...

 

=> (3 * (2 * 1))

 

=> 6

Here, we can see the chained multiplications “stack up” during the recursive calls. Writing this in a substitution-based style makes it easy to track where the return values of function calls end up.

2.4 Testing with OUnit

Testing by printing values becomes pretty onerous when we want to write more than a few examples. In this course, we’ll use a library called OUnit to write tests.

With OUnit, we will write declarations in one file, and test them in another. The code provided in your checkout has two files: functions.ml, which you’ll fill in with some implementations in the rest of the exercises, and test.ml, which will contain tests. This will become a common layout for how we write our programs in this course.

The weird-looking >:: operator is described here in terms of more basic concepts, if you’re interested. It’s just a shorthand for constructing a test value with a name. A test in OUnit is a name paired with a function of one argument. The function uses one of several different test predicates to check a computation—the one we’ll use most commonly is assert_equal. The syntax >:: is used to combine the name and the function together into a test. Here’s an example:

open OUnit2

let check_fun _ = (* a function of one argument *)

  assert_equal (2 + 2) 4;;

 

let my_first_test = "my_first_test">::check_fun;;

Most of this is just boilerplate that you won’t have to think much, if at all, about. But it’s useful to explain it once. Now my_first_test is a named test value. Note that we used an underscore when defining the parameter of check_fun; we can use an underscore to indicate to OCaml that we don’t care about the argument (there needs to be a parameter because of how the testing library works, even though we won’t use the parameter). We can run our test by creating a suite out of a list of tests, and running the suite:

let suite = "suite">:::[my_first_test];;

run_test_tt_main suite

The command used is not ocaml in this case, but a wrapper around ocaml called ocamlfind that knows how to search your system for packages installed with e.g. OPAM. Candidly, OCaml’s build system can be a little onerous, so I’m not teaching it explicitly. If you want to use OCaml for a large project outside this course, I recommend learning about the corebuild tool that comes with Real World OCaml. To build and run the given skeleton, use the provided Makefile that does the work of building for you. In this case, you just need to run

$ make test

$ ./test

in order to run the tests.

We can also add tests that fail to see what happens:

let check_fun2 _ = (* a failing test *)

  assert_equal (2 + 2) 5;;

 

let my_second_test = "my_second_test">::check_fun2;;

If we add this test to the suite and run, we get a failure:

$ ./test

.F

==============================================================================

Error: suite:1:my_second_test.

 

File "/Users/joe/.../oUnit-suite-prob#02.log", line 2, characters 1-1:

Error: suite:1:my_second_test (in the log).

 

Raised at file "src/oUnitAssert.ml", line 45, characters 8-27

Called from file "src/oUnitRunner.ml", line 46, characters 13-26

 

not equal

------------------------------------------------------------------------------

Ran: 2 tests in: 0.14 seconds.

FAILED: Cases: 2 Tried: 2 Errors: 0 Failures: 1 Skip:  0 Todo: 0 Timeouts: 0.

This output identifies the failing test by name (my_second_test), though it doesn’t tell us much more than that. Another annoying thing about the way we wrote those tests is that defining a new function for every test causes significant extra typing. To get a little more information out, we can pass assert_equal an optional argument that specifies how to turn the values under test into a string for printing. We can bundle that up inside a function that creates the test with its name. So, for example, we can define a function that creates tests comparing integers to integers:

let t_int name value expected = name>::

  (fun _ -> assert_equal expected value ~printer:string_of_int);;

 

let my_third_test = t_int "my_third_test" (2 + 2) 7;;

let my_fourth_test = t_int "my_fourth_test" (2 + 2) 4;;

If we add these two tests to the suite, we see a much more useful failure report that says expected: 7 but got: 4. I’ll often provide useful helper functions for testing with examples, but you may also decide to write your own for different kinds of tests as the semester goes on.

2.5 Exercises
  1. Implement fibonacci as an OCaml function that takes an integer n and returns the nth fibonacci number. Write out the evaluation of (fibonacci 4) in substitution style.

  2. Write tests for min and fibonacci using t_int.

3 Programming in OCaml — Datatypes

Programming with only integers, we wouldn’t make much progress on building a compiler. The next thing we need to do is understand how to create new datatypes in OCaml, and program with them.

3.1 Binary Trees with type

Let’s start with a datatype we all ought to know well—binary trees. We know we’ll need to represent a binary tree node somehow, which has a value and two children. For now, let’s say the value has to be a string. In OCaml, we can define such a node using the keyword type:

type btnode =

  | Leaf

  | Node of string * btnode * btnode

Translated into English, this reads:

A binary tree node is either a leaf of the tree, which has no fields, or a node, which has three fields: a string, a binary tree node, and another binary tree node.

This defines what we call constructors for Leaf and Node, which we can use to construct trees. Here are a few examples of trees and their corresponding btnode value:

    "a"       Node("a",

   /   \        Node("b", Leaf, Leaf), Node("c", Leaf, Leaf))

"b"     "c"

    "a"       Node("a",

   /            Node("b", Leaf, Leaf), Leaf)

"b"

    "a"       Node("a",

   /            Node("b",

"b"               Leaf, Node("c", Leaf, Leaf)), Leaf)

   \

    "c"

A Leaf is used here where you may have seen NULL in a C++ implementation of a binary tree. Each position with no child corresponds to a Leaf, and the others correspond to uses of Node. We call Leaf and Node variants of the btnode type.

3.2 Manipulating Data with match

The next question is how to work with these values. For example, how can we construct an in-order concatenation of the strings in a btnode as we’ve defined it? That is, how do we fill in this function:

let rec inorder_str (btn : btnode) : string =

  ...

The next feature we need to introduce is match, which allows us to examine which variant of a type a particular value has, and extract the values of its fields. Here are some examples:

let m1 = match Leaf with

  | Leaf -> true

  | Node(s, left, right) -> false;;

 

(* m1 is true *)

 

let m2 = match Leaf with

  | Leaf -> 44

  | Node(s, left, right) -> 99;;

 

(* m2 is 44 *)

 

let m3 = match Node("a", Leaf, Leaf) with

  | Leaf -> "z"

  | Node(s, left, right) -> s;;

 

(* m3 is "a" *)

 

let m4 = match Node("a", Node("b", Leaf, Leaf), Leaf) with

  | Leaf -> "z"

  | Node(s, left, right) ->

    match left with

      | Leaf -> "y"

      | Node(s2, left2, right2) -> s2;;

 

(* m4 is "b" *)

From these examples, we can see how match must work. It inspects the value after the match keyword, and selects the branch that corresponds to the variant of that value. Then it extracts the fields from the value, and substitutes them for the names given in the branch. Let’s use the m4 example to make that concrete:

  match Node("a", Node("b", Leaf, Leaf), Leaf) with

    | Leaf -> "z"

    | Node(s, left, right) ->

      match left with

        | Leaf -> "y"

        | Node(s2, left2, right2) -> s2

 

(* substitute Node("b", Leaf, Leaf) for left *)

 

=> match Node("b", Leaf, Leaf) with

     | Leaf -> "y"

     | Node(s2, left2, right2) -> s2

 

(* substitute "b" for s2 *)

 

=> "b"

With match available, we can now fill in the body for inorder_str. We can start by writing out a skeleton of the match structure for a btnode, as most functions over btnodes will need to match on the node to decide what to do.

let rec inorder_str (bt : btnode) : string =

  match bt with

    | Leaf -> ...

    | Node(s, left, right) -> ...

Now we can ask what the preorder traversal should yield in the case of a leaf of the tree (or an empty tree altogether). In this case, that ought to be an empty string. So the Leaf case should be filled in with "". How about for Nodes? We know an inorder traversal should have the elements to the left in order, then the current node, then the elements to the right. We can get the elements to either side via a recursive call, and then we just need one more piece of information, which is that ^ is the operator for concatenating strings in OCaml:

let rec inorder_str (bt : btnode) : string =

  match bt with

    | Leaf -> ""

    | Node(s, left, right) ->

      (inorder_str left) ^ s ^ (inorder_str right)

3.3 Exercises
  1. This is a trick question.Write a test function t_string that’s like t_int, but tests for equality of strings. Can you write a function that produces a string form of the results like t_int did for integers?

  2. Write at least five interesting tests for inorder_str.

  3. Write out the substitution-based evaluation of inorder_str on a tree with at least 3 nodes.

  4. Write a function size that takes a btnode and produces an integer that is the number of Nodes in the tree.

  5. Write a function height that takes a btnode and produces an integer that is the height of the tree.

  6. Make sure to test the above two functions.

4 Programming in OCaml — Lists and Parametric Polymorphism
4.1 Linked Lists, By Hand

Since we’ve seen binary trees, it’s natural to think about a similar definition for the nodes of a linked list. One OCaml datatype we could write is:

type llist =

  | Empty

  | Link of string * llist

That is, a list is either Empty (the end of the list), or a Link of a string onto another list. Of course, this would require that we write additional datatype declarations for lists of numbers, lists of booleans, lists of binary trees, and so on, if we needed those shapes of data. The natural solution is to make the datatype generic over the kind of data it uses. OCaml lets us do this by defining datatypes with type variables that can be filled with any type. Type variables are written with a leading character:

This idea corresponds to the use of templates in C++.

type 'a llist =

  | Empty

  | Link of 'a * 'a llist

We won’t have much use for heterogeneous lists, which would introduce different complications. The types of the fields in Link have changed with this addition. The first field can now hold a value of the list’s type, and the second must hold a llist that contains elements of that type as well. That is, this describes a homogeneous linked list, where all elements will have the same type.

Lets say we want to write a function sum that takes a llist of numbers and adds them up. We now need to describe its type in terms of the contents, which will be an int:

let rec sum (l : int llist) : int =

  match l with

    | Empty -> 0

    | Link(first, rest) -> first + (sum rest)

When we construct llists, we do not need to provide any extra type information – OCaml figures it out for us. For example, we can write:

let l1 = Link(1, Empty);;

let l2 = Link("a", Empty);;

Here, l1 will have type int llist, and l2 will have type string llist.

4.2 Linked Lists, Built-in

It turns out that our definition of llist above is important enough that a version of it is built into OCaml, just with slightly different names and syntax. The built-in equivalent of Empty is written [], and Link(first, rest) is written first::rest. The syntax [a;b;c] is shorthand for a::b::c::[]. The type of built-in lists is a list, which can be specialized for any list contents. For example, we could rewrite sum above as:

let rec sum2 (l : int list) : int =

  match l with

    | [] -> 0

    | first::rest -> first + (sum2 rest)

And we could test it by creating tests with t_int:

(* these would go in the suite list *)

t_int "sum2_empty" (sum2 []) 0;

t_int "sum2_single" (sum2 [5]) 5;

t_int "sum2_longer" (sum2 [3; 4; 5]);

t_int "sum2_longer2" (sum2 3::4::5::[]);

Note that the last two tests mean the same thing; they are just different ways of writing the same list containing 3, 4, and 5.

Since lists are quite a fundamental structure, we will end up using them frequently; handy functions to use with lists are here, and we’ll talk about them more as we build up more experience with OCaml.

4.3 Exercises
  1. Write and test a function increment_all that takes an int list and produces a new int list with all the elements increased by 1.

  2. Write and test a function long_strings that takes a string list and an int and produces a new string list that contains all the strings that had length greater than the given int. You can get the length of a string with the function String.length. Other string functions are documented here.

  3. Write and test a function every_other that takes a a list (a list with elements of any one type), and produces a new list that contains every other element from the given list, starting with the first element.

  4. Write and test a function sum_all that takes an int list list (that is, a list of lists of integers), and returns an int list that contains the sums of the sub-lists.