OCaml Warm-Up

Module 0 — Program Analysis Bootcamp

Getting ready for program analysis with OCaml

5 exercises • ~2 hours • No tests — guided tutorials

Learning Objectives

By the end of this module, you will be able to:

  1. Write OCaml functions using let bindings, type annotations, pattern matching, and recursion
  2. Define and manipulate algebraic data types (ADTs) representing expression trees
  3. Use collection typesList.map/fold, StringMap, StringSet, and ref
  4. Build modules satisfying a signature and use functors to parameterize code
  5. Read and extend ocamllex/Menhir grammar rules for a simple parser
Why these five? Each exercise directly foreshadows a concept you'll use in Modules 2-6: AST types, dataflow sets, abstract domains, and parser grammars.

Why OCaml for Program Analysis?

Pattern Matching

Match on AST node types directly. The compiler warns you if you forget a case.

Algebraic Data Types

ASTs, lattice values, and analysis results are all naturally expressed as ADTs.

Type Safety

Strong static types catch bugs at compile time — no null pointer surprises.

Immutability by Default

Functional style means fewer side effects, easier reasoning about program state.

Module System

Signatures + functors let you write generic analyses parameterized by abstract domains.

Tooling

ocamllex and Menhir provide industrial-strength lexer/parser generators.

Industry note: Facebook's Infer, Jane Street's trading systems, and the Coq proof assistant are all built in OCaml.

OCaml Basics

Top-level bindings

(* Immutable binding *) let x = 42 (* Function with type annotations *) let square (n : int) : int = n * n (* Multiple arguments *) let add (a : int) (b : int) : int = a + b

Local bindings with let...in

let hypotenuse a b = let a2 = a *. a in let b2 = b *. b in Float.sqrt (a2 +. b2)

Tuples — lightweight grouping

(* A position is (line, column) *) type pos = int * int let origin : pos = (1, 1) (* Destructure in function args *) let format_pos ((line, col) : pos) = Printf.sprintf "line %d, col %d" line col

Tuples are positional — access by pattern matching, not by name.

Records — named fields

type assignment = { var_name : string; value : int; line : int; } let a = { var_name = "x"; value = 5; line = 1 } let name = a.var_name (* Functional update — NEW record *) let a' = { a with value = a.value + 3 }

Records are immutable by default. Use { r with field = v } to "update".

If / then / else (expression, not statement)

let abs x = if x >= 0 then x else -x (* Returns a value — no "return" keyword *) let classify n = if n > 0 then "positive" else if n < 0 then "negative" else "zero"
Key idea: Everything in OCaml is an expression that produces a value. There are no statements.
Exercise 1 builds on these: square, is_empty, greet, is_digit, classify_char.

Printf and String Formatting

String concatenation

let greeting = "Hello, " ^ "world!" let msg = "x = " ^ string_of_int 42

Printf — type-safe formatted output

(* Print to stdout *) Printf.printf "name = %s, age = %d\n" "Alice" 30 (* Format to a string *) let s = Printf.sprintf "[%s: %s]" "keyword" "if"

Common format specifiers

SpecTypeExample
%dint42
%sstring"hello"
%ffloat3.14
%bbooltrue

Type safety in action

(* Compile error! OCaml checks format types *) Printf.printf "%d" "not an int" (* Error: This expression has type string but ... expected int *)
Contrast with C: OCaml's Printf is checked at compile time. No %s-on-an-int crashes.
Why this matters: You'll use Printf.sprintf extensively for formatting analysis results — error messages, lattice value displays, and debug output.

What is an AST?

An Abstract Syntax Tree is a tree representation of source code — the data structure every program analysis tool operates on.

Source code is just text

x = 3 + y * 2

To a computer, this is just characters. You can't easily answer:

  • Which variables are used?
  • What operations are performed?
  • Is y * 2 computed before adding 3?
Key idea: Program analysis = tree traversal. Walk the AST, collect info at each node, propagate results.
Click Step to build the AST from source code...

The AST captures structure

Algebraic Data Types (ADTs)

ADTs let you define types with multiple variants, each carrying different data. They are the backbone of ASTs.

Defining variants

(* Binary operators *) type op = Add | Sub | Mul (* Expression tree — a mini AST *) type expr = | Num of int | Var of string | BinOp of op * expr * expr

Each variant is a constructor that tags the data it carries.

Try it: click to build an expression

Click leaf nodes first, then operators to combine them.

Pattern Matching

match...with is OCaml's most powerful control structure. It destructures values and the compiler ensures you handle every case.

Basic matching

let string_of_op o = match o with | Add -> "+" | Sub -> "-" | Mul -> "*"

Recursive matching on trees

let rec string_of_expr e = match e with | Num n -> string_of_int n | Var x -> x | BinOp (o, l, r) -> Printf.sprintf "(%s %s %s)" (string_of_expr l) (string_of_op o) (string_of_expr r)

Exhaustiveness checking

(* If you forget a case: *) let bad o = match o with | Add -> "+" | Sub -> "-" (* Warning 8: this pattern-matching is not exhaustive. Case not matched: Mul *)
This is critical for analysis. When you add a new AST node type, the compiler tells you every function that needs updating.

Matching on tuples

let classify (x, y) = match (x, y) with | (0, 0) -> "origin" | (0, _) -> "y-axis" | (_, 0) -> "x-axis" | _ -> "other"

Recursion and the Option Type

Recursive functions with let rec

(* Count nodes in an expression tree *) let rec count_nodes e = match e with | Num _ | Var _ -> 1 | BinOp (_, l, r) -> 1 + count_nodes l + count_nodes r
(* Tree depth *) let rec depth e = match e with | Num _ | Var _ -> 1 | BinOp (_, l, r) -> 1 + max (depth l) (depth r)

Option: safe "nullable" values

(* Option type: Some x or None *) type 'a option = Some of 'a | None (* Evaluate if no variables present *) let rec eval e = match e with | Num n -> Some n | Var _ -> None (* can't evaluate *) | BinOp (o, l, r) -> match eval l, eval r with | Some a, Some b -> Some (apply_op o a b) | _ -> None
Foreshadow: "We might not know the exact value" is the norm in abstract interpretation (Module 4). Option is a tiny abstract domain: Some n = known, None = unknown.

Expression Tree Transforms

Tree transformations are the core mechanic of program analysis. Watch substitution and constant folding in action.

(* substitute "x" 5 (x * (1 + y)) *) let rec substitute var_name value e = match e with | Num _ -> e | Var x -> if x = var_name then Num value else e | BinOp (o, l, r) -> BinOp (o, substitute var_name value l, substitute var_name value r)
(* simplify (5 * (1 + y)) when y=3 *) let rec simplify e = match e with | Num _ | Var _ -> e | BinOp (o, l, r) -> match simplify l, simplify r with | Num a, Num b -> Num (apply_op o a b) | l', r' -> BinOp (o, l', r')
Substitution: replace "x" with 5 in x*(1+y)

Lists and fold_left

Lists are OCaml's primary collection. fold_left is the universal "reduce" — you'll use it everywhere in program analysis.

List basics

let xs = [1; 2; 3; 4; 5] let ys = 0 :: xs (* [0;1;2;3;4;5] *)

fold_left — reduce to a single value

let sum xs = List.fold_left (fun acc x -> acc + x) 0 xs (* sum [1;2;3;4] = 10 *)

Also useful: map and filter

List.map (fun x -> x * 2) [1;2;3] (* [2; 4; 6] *) List.filter (fun x -> x > 0) [-1;3;0;5] (* [3; 5] *)
Why this matters: You'll use fold_left to build environments from assignment lists, accumulate analysis results, and compute fixpoints.

Watch fold_left in action

fold_left (+) 0 [1; 2; 3; 4] — click Step to trace

Collections: Map and Set

OCaml provides immutable, balanced-tree-backed Map and Set via functors.

StringMap — variable environments

module StringMap = Map.Make(String) let build_env pairs = List.fold_left (fun env (k, v) -> StringMap.add k v env) StringMap.empty pairs let lookup env name = StringMap.find_opt name env

StringSet — variable sets

module StringSet = Set.Make(String) let s1 = StringSet.of_list ["x"; "y"; "z"] let s2 = StringSet.of_list ["y"; "z"; "w"] StringSet.union s1 s2 (* {w,x,y,z} *) StringSet.inter s1 s2 (* {y,z} *)
Foreshadow: Modules 3-5 use StringSet for live-variable sets and taint sets. Map stores variable→abstract-value bindings.

Try it: build an environment

Add key-value pairs to build a StringMap.

Records and Mutable State

Records in practice

type assignment = { var_name : string; value : int; line : int; } let a = { var_name="x"; value=5; line=1 } (* Functional update — NEW record *) let a' = { a with value = a.value + 3 }

Mutable state with ref

(* ref creates a mutable cell *) let counter = ref 0 (* Read with ! *) let current = !counter (* 0 *) (* Write with := *) counter := !counter + 1 (* now 1 *) (* Counter factory with closure *) let make_counter () = let n = ref 0 in fun () -> let v = !n in n := v + 1; v
Use sparingly. You'll see ref in fixpoint loops (Modules 3-4) where a worklist updates until convergence.

Challenge A: AST & Pattern Matching

Given the ADT and function below, predict the output.

type expr = | Num of int | Var of string | BinOp of op * expr * expr let rec count_vars e = match e with | Num _ -> 0 | Var _ -> 1 | BinOp (_, l, r) -> count_vars l + count_vars r

What does this return?

count_vars (BinOp (Add, BinOp (Mul, Var "x", Num 3), BinOp (Add, Var "y", Var "z")))
Your answer:

Visualize the tree

Why Modules?

As programs grow, you need ways to organize code, hide implementation details, and write reusable components.

Without modules: name clashes

type sign = Pos | Neg | Zero | Unknown let sign_join a b = ... let sign_to_string s = ... type taint = Clean | Tainted | TUnknown let taint_join a b = ... let taint_to_string t = ... (* Name clashes! Both need "join", "to_string", "Unknown" *)

With modules: clean namespaces

module Sign = struct type t = Pos | Neg | Zero | Unknown let join a b = ... let to_string s = ... end module Taint = struct type t = Clean | Tainted | Unknown let join a b = ... end (* No clashes! *) Sign.join Sign.Pos Sign.Neg Taint.join Taint.Clean Taint.Tainted
Key insight: Modules are like "super structs" — they can contain types, values, functions, and even other modules. Each module is its own namespace.

Signatures & Structures

A signature = interface (what). A structure = implementation (how). Sealing hides internals.

Signature (module type)

module type COUNTER = sig type t (* abstract! *) val create : int -> t val increment : t -> t val value : t -> int end

Structure (sealed by signature)

module SafeCounter : COUNTER = struct type t = { count: int; max: int } let create max = { count=0; max } let increment c = if c.count < c.max then { c with count = c.count+1 } else c let value c = c.count end

What the outside world sees

Analogy: A signature is like a Java interface. It says "you must provide these operations" without dictating implementation.

What is a Lattice?

A lattice models "levels of knowledge" — the theoretical foundation of all program analysis in this bootcamp.

Everyday intuition

What color is the next traffic light? Before driving: "I have no idea" = bottom After GPS hint: "Red or Yellow" = partial info After seeing it: "Red" = precise Conflicting GPS: "Could be anything" = top
Key rule: Information only flows upward. Once you say "Red or Yellow," you can't go back to "definitely Red" without new evidence.

Why program analysis needs this

Programs have branches (if/else, loops). At merge points, we must combine info from multiple paths. A lattice tells us how to combine safely.

Hasse diagram — click nodes to explore

Click any node to learn about it.

Lattice Operations: Join

Every lattice has bottom, top, and join. Let's explore with the Sign lattice.

The Sign lattice

Pick two values below and click Join to see the result on the diagram.

Interactive join calculator



Why it works this way

if (cond) { x = 5; // x is Pos } else { x = -3; // x is Neg } // What is x here? // join(Pos, Neg) = ⊤ ("could be either")
Rule: join(a, b) = smallest value above both a and b in the diagram. Same value? Keep it. Different? Go up.

Challenge B: Lattice Join

Given this code, predict the sign of x after the if/else using the Sign lattice.

// Program 1: if (cond) { x = 0; // x is ??? } else { x = 0; // x is ??? } // x after merge = ???
Program 1 result:
// Program 2: if (cond) { x = 7; // x is ??? } else { x = -2; // x is ??? } // x after merge = ???
Program 2 result:
// Program 3: if (cond) { x = 3; // x is ??? } else { x = 100; // x is ??? } // x after merge = ???
Program 3 result:

Sign Lattice Reference

The LATTICE Signature in OCaml

Now let's encode what we learned as an OCaml module signature. Every analysis domain implements this interface.

The signature

module type LATTICE = sig type t val bottom : t val top : t val join : t -> t -> t val equal : t -> t -> bool val to_string : t -> string end

BoolLattice implementation

module BoolLattice : LATTICE with type t = bool = struct type t = bool let bottom = false let top = true let join a b = a || b let equal a b = (a = b) let to_string b = if b then "true" else "false" end

What each part means

Signature itemPurpose
type tThe abstract value type
bottomLeast element (no info / unreachable)
topGreatest element (could be anything)
joinMerge info from two analysis paths
equalCheck convergence (fixpoint test)
to_stringFor debugging / display
Key insight: Different analyses use different types for t — booleans, signs, intervals, taint labels — but they all satisfy the same LATTICE interface.
Foreshadow: Modules 3-5 each define their own LATTICE implementations. The analysis framework code is generic over this interface.

Building a Lattice Module

Let's build a richer lattice — ThreeValueLattice — and test it interactively.

type three_value = | Bot | Zero | Positive | Unknown module ThreeValueLattice : LATTICE with type t = three_value = struct type t = three_value let bottom = Bot let top = Unknown let join a b = if a = b then a else if a = Bot then b else if b = Bot then a else Unknown let equal a b = (a = b) let to_string = function | Bot -> "Bot" | Zero -> "Zero" | Positive -> "Positive" | Unknown -> "Unknown" end

Test it: click two nodes to join

Click a node to select Value A, then another for Value B. The join result will be highlighted.

Functors: The Big Picture

A functor = function from modules to modules. Write generic analysis code once, plug in different domains.

The code duplication problem

(* Copy-paste for every domain! *) module SignEnv = struct let lookup env x = match find_opt x env with | Some v -> v | None -> SignLattice.bottom end

The functor solution

(* Write once! *) module MakeEnv (L : LATTICE) = struct let lookup env x = match find_opt x env with | Some v -> v | None -> L.bottom let join env1 env2 = union (fun _k v1 v2 -> Some (L.join v1 v2)) env1 env2 end (* Instantiate for each domain *) module SignEnv = MakeEnv(SignLattice) module TaintEnv = MakeEnv(TaintLattice)

Animated: functor instantiation

You Already Use Functors!

Map.Make and Set.Make from OCaml's standard library are functors. You used them on the Collections slide.

Map.Make — the functor

(* Standard library: *) module Map.Make (Ord : OrderedType) = struct type key = Ord.t type 'a t = ... (* balanced tree *) val find_opt : key -> 'a t -> 'a option val add : key -> 'a -> 'a t -> 'a t end (* OrderedType signature: *) module type OrderedType = sig type t val compare : t -> t -> int end

Instantiation

module StringMap = Map.Make(String) module IntMap = Map.Make(Int)

Module system summary

ConceptWhat it isAnalogy
ModuleNamespace + implementationJava class
SignatureInterface / contractJava interface
FunctorModule → Module functionJava generics
The bootcamp pattern:
1. Define a LATTICE signature
2. Implement it for each analysis domain
3. Use MakeEnv functor to get environments
4. Write analysis code generic over the interface
Foreshadow: This is exactly lib/abstract_domains/abstract_env.ml. Modules 3-5 plug in sign, interval, and taint domains.

Module System: Quick Reference

Everything you need to know before Exercise 4.

Defining a signature

module type SIG_NAME = sig type t (* abstract type *) val operation : t -> t -> t end

Implementing (with type exposed)

module Impl : SIG_NAME with type t = my_type = struct type t = my_type let operation a b = ... end

Implementing (type hidden)

module Impl : SIG_NAME = struct type t = my_secret_type let operation a b = ... end

Defining a functor

module MakeThing (M : SIG_NAME) = struct (* Use M.t, M.operation here *) let do_stuff x = M.operation x M.??? end

Using a functor

module ConcreteThing = MakeThing(Impl) (* Now use ConcreteThing.do_stuff *)
Common mistake: Forgetting with type t = ... makes the type abstract. If callers need to create values of type t directly, you must expose it.
Think of it as: Signature = plug shape. Structure = the plug. Functor = an appliance that accepts any plug of that shape.

What is Parsing?

Transforming raw source code into a structured tree — in two stages

Two-Stage Pipeline
1. Lexing — characters → tokens (words)
2. Parsing — tokens → AST (tree)
Analogy: Reading a sentence
Lexing = recognizing individual words
Parsing = understanding grammar structure

Lexing: Character by Character

The lexer scans left-to-right, grouping characters into tokens

Key Insight: Whitespace is consumed but not emitted as a token — it just separates things.
rule token = parse
| [' ' '\t'] { skip }
| ['0'-'9']+ { INT }
| ['a'-'z']+ { IDENT }
| '+' { PLUS }
| '=' { EQUALS }

ocamllex & Menhir

OCaml's standard tools for writing lexers and parsers

Lexer — lexer.mll

{ open Parser }
 
rule token = parse
| [' ' '\t' '\n'] { token lexbuf }
| ['0'-'9']+ as n
{ INT(int_of_string n) }
| ['a'-'z' '_']['a'-'z' '0'-'9' '_']*
as s { IDENT(s) }
| '+' { PLUS }
| "let" { LET }
| '=' { EQUALS }
| eof { EOF }

Parser — parser.mly

%token<int> INT
%token<string> IDENT
%token PLUS LET EQUALS EOF
%left PLUS
%start<Ast.expr> prog
%%
 
prog: e=expr EOF { e }
 
expr:
| n=INT { IntLit n }
| x=IDENT { Var x }
| e1=expr PLUS e2=expr
{ BinOp(Add,e1,e2) }
.mll files
Regex rules → OCaml lexer code
.mly files
Grammar rules → OCaml parser code
Both generate .ml
Dune runs them automatically

Parsing Ambiguity & Precedence

How does 1 + 2 * 3 parse? It depends on the rules.

Without precedence rules:
1 + 2 * 3 could mean (1 + 2) * 3 = 9
The parser doesn't know which operator to group first!
Menhir precedence directives:
(* lowest precedence first *)
%left PLUS MINUS
%left TIMES DIV
(* highest precedence last *)

Challenge C: Parse This!

Given this grammar and input, predict the token stream and AST

(* Input: *) let z = x + 1



Warm-Up Exercises

Hands-on practice to prepare you for the bootcamp

# Exercise Topics Tests
1 ocaml-basics Pattern matching, recursion, lists, options 20
2 ast-construction Building AST nodes, pretty-printing 15
3 ast-interpreter Tree walking, environments, evaluation 20
4 simple-parser Lexing, parsing, ocamllex/Menhir 15
Workflow
cd exercises/N-name/starter
dune builddune runtest
Fill in TODO stubs → tests pass ✓
Tip: Start with Exercise 1. Each builds on skills from the previous one. Run dune runtest often — incremental progress is key!

Bootcamp Roadmap

Where this warm-up fits in the bigger picture

Key Takeaways & Next Steps

Everything you need to succeed in the bootcamp

What You Learned

OCaml Basics
Types, pattern matching, recursion, options
ASTs & ADTs
Tree representation of code, variant types
Module System
Signatures, structures, functors
Lattices
Partial orders, join, bottom/top — the math of analysis
Parsing Pipeline
Lexing → Parsing → AST via ocamllex/Menhir

Next Steps

1
Fork the repo
Clone your fork, run opam install . --deps-only
2
Complete the warm-up exercises
4 exercises, ~70 tests total. dune runtest to check.
3
Submit your repo URL
Log in to the course site and enter your fork URL
!
You're ready for Module 1!
Parsing & lexing in depth — building on what you learned today

Quiz: OCaml & ASTs

Test your understanding of OCaml basics and abstract syntax trees

Q1: What does this return?

match [1;2;3] with
| [] -> 0
| x :: _ -> x

Q2: AST for x + 1?

Q3: What does Option replace?

Quiz: Lattices & Modules

Trace through lattice operations and module concepts

Given the Sign lattice:

(* Top = any sign *)
(* Pos | Zero | Neg *)
(* Bot = unreachable *)

What is join Pos Neg?

What is join Bot Neg?

What does a functor take as input?

Quiz: Predict the Token Stream

Given source code, predict what the lexer produces

(* Source: *)
let f = x + 10

Type the token stream (comma-separated):

How many tokens (excluding EOF)?