Write OCaml functions using let bindings, type annotations, pattern matching, and recursion
Define and manipulate algebraic data types (ADTs) representing expression trees
Use collection types — List.map/fold, StringMap, StringSet, and ref
Build modules satisfying a signature and use functors to parameterize code
Read and extendocamllex/Menhir grammar rules for a simple parser
Why these five? Each exercise directly foreshadows a concept you'll use in Modules 2-6: AST types, dataflow sets, abstract domains, and parser grammars.
Why OCaml for Program Analysis?
Pattern Matching
Match on AST node types directly. The compiler warns you if you forget a case.
Algebraic Data Types
ASTs, lattice values, and analysis results are all naturally expressed as ADTs.
Type Safety
Strong static types catch bugs at compile time — no null pointer surprises.
Immutability by Default
Functional style means fewer side effects, easier reasoning about program state.
Module System
Signatures + functors let you write generic analyses parameterized by abstract domains.
Tooling
ocamllex and Menhir provide industrial-strength lexer/parser generators.
Industry note: Facebook's Infer, Jane Street's trading systems, and the Coq proof assistant are all built in OCaml.
OCaml Basics
Top-level bindings
(* Immutable binding *)let x = 42(* Function with type annotations *)let square (n : int) : int = n * n(* Multiple arguments *)let add (a : int) (b : int) : int = a + b
Local bindings with let...in
let hypotenuse a b =let a2 = a *. a inlet b2 = b *. b in Float.sqrt (a2 +. b2)
Tuples — lightweight grouping
(* A position is (line, column) *)type pos = int * intlet origin : pos = (1, 1)(* Destructure in function args *)let format_pos ((line, col) : pos) = Printf.sprintf "line %d, col %d" line col
Tuples are positional — access by pattern matching, not by name.
Records — named fields
type assignment = { var_name : string; value : int; line : int;}let a = { var_name = "x"; value = 5; line = 1 }let name = a.var_name(* Functional update — NEW record *)let a' = { a with value = a.value + 3 }
Records are immutable by default. Use { r with field = v } to "update".
If / then / else (expression, not statement)
let abs x =if x >= 0 then x else -x(* Returns a value — no "return" keyword *)let classify n =if n > 0 then "positive"else if n < 0 then "negative"else "zero"
Key idea: Everything in OCaml is an expression that produces a value. There are no statements.
Exercise 1 builds on these: square, is_empty, greet, is_digit, classify_char.
(* Print to stdout *)Printf.printf "name = %s, age = %d\n" "Alice" 30(* Format to a string *)let s = Printf.sprintf "[%s: %s]" "keyword" "if"
Common format specifiers
Spec
Type
Example
%d
int
42
%s
string
"hello"
%f
float
3.14
%b
bool
true
Type safety in action
(* Compile error! OCaml checks format types *)Printf.printf "%d" "not an int"(* Error: This expression has type string but ... expected int *)
Contrast with C: OCaml's Printf is checked at compile time. No %s-on-an-int crashes.
Why this matters: You'll use Printf.sprintf extensively for formatting analysis results — error messages, lattice value displays, and debug output.
What is an AST?
An Abstract Syntax Tree is a tree representation of source code — the data structure every program analysis tool operates on.
Source code is just text
x = 3 + y * 2
To a computer, this is just characters. You can't easily answer:
Which variables are used?
What operations are performed?
Is y * 2 computed before adding 3?
Key idea: Program analysis = tree traversal. Walk the AST, collect info at each node, propagate results.
Click Step to build the AST from source code...
The AST captures structure
Algebraic Data Types (ADTs)
ADTs let you define types with multiple variants, each carrying different data. They are the backbone of ASTs.
Defining variants
(* Binary operators *)type op = Add | Sub | Mul(* Expression tree — a mini AST *)type expr = | Num of int | Var of string | BinOp of op * expr * expr
Each variant is a constructor that tags the data it carries.
Try it: click to build an expression
Click leaf nodes first, then operators to combine them.
Pattern Matching
match...with is OCaml's most powerful control structure. It destructures values and the compiler ensures you handle every case.
Basic matching
let string_of_op o =match o with | Add -> "+" | Sub -> "-" | Mul -> "*"
Recursive matching on trees
let rec string_of_expr e =match e with | Num n -> string_of_int n | Var x -> x | BinOp (o, l, r) -> Printf.sprintf "(%s %s %s)" (string_of_expr l) (string_of_op o) (string_of_expr r)
Exhaustiveness checking
(* If you forget a case: *)let bad o = match o with | Add -> "+" | Sub -> "-"(* Warning 8: this pattern-matching is not exhaustive. Case not matched: Mul *)
This is critical for analysis. When you add a new AST node type, the compiler tells you every function that needs updating.
Matching on tuples
let classify (x, y) = match (x, y) with | (0, 0) -> "origin" | (0, _) -> "y-axis" | (_, 0) -> "x-axis" | _ -> "other"
Recursion and the Option Type
Recursive functions with let rec
(* Count nodes in an expression tree *)let rec count_nodes e =match e with | Num _ | Var _ -> 1 | BinOp (_, l, r) -> 1 + count_nodes l + count_nodes r
(* Tree depth *)let rec depth e =match e with | Num _ | Var _ -> 1 | BinOp (_, l, r) -> 1 + max (depth l) (depth r)
Option: safe "nullable" values
(* Option type: Some x or None *)type 'a option = Some of 'a | None(* Evaluate if no variables present *)let rec eval e =match e with | Num n -> Some n | Var _ -> None (* can't evaluate *) | BinOp (o, l, r) ->match eval l, eval r with | Some a, Some b -> Some (apply_op o a b) | _ -> None
Foreshadow: "We might not know the exact value" is the norm in abstract interpretation (Module 4). Option is a tiny abstract domain: Some n = known, None = unknown.
Expression Tree Transforms
Tree transformations are the core mechanic of program analysis. Watch substitution and constant folding in action.
(* substitute "x" 5 (x * (1 + y)) *)let rec substitute var_name value e =match e with | Num _ -> e | Var x ->if x = var_namethen Num value else e | BinOp (o, l, r) -> BinOp (o, substitute var_name value l, substitute var_name value r)
(* simplify (5 * (1 + y)) when y=3 *)let rec simplify e =match e with | Num _ | Var _ -> e | BinOp (o, l, r) ->match simplify l, simplify r with | Num a, Num b -> Num (apply_op o a b) | l', r' -> BinOp (o, l', r')
Substitution: replace "x" with 5 in x*(1+y)
Lists and fold_left
Lists are OCaml's primary collection. fold_left is the universal "reduce" — you'll use it everywhere in program analysis.
Foreshadow: Modules 3-5 use StringSet for live-variable sets and taint sets. Map stores variable→abstract-value bindings.
Try it: build an environment
Add key-value pairs to build a StringMap.
Records and Mutable State
Records in practice
type assignment = { var_name : string; value : int; line : int;}let a = { var_name="x"; value=5; line=1 }(* Functional update — NEW record *)let a' = { a with value = a.value + 3 }
Mutable state with ref
(* ref creates a mutable cell *)let counter = ref 0(* Read with ! *)let current = !counter (* 0 *)(* Write with := *)counter := !counter + 1 (* now 1 *)(* Counter factory with closure *)let make_counter () =let n = ref 0 infun () ->let v = !n in n := v + 1; v
Use sparingly. You'll see ref in fixpoint loops (Modules 3-4) where a worklist updates until convergence.
Challenge A: AST & Pattern Matching
Given the ADT and function below, predict the output.
type expr = | Num of int | Var of string | BinOp of op * expr * exprlet rec count_vars e =match e with | Num _ -> 0 | Var _ -> 1 | BinOp (_, l, r) -> count_vars l + count_vars r
What does this return?
count_vars (BinOp (Add, BinOp (Mul, Var "x", Num 3), BinOp (Add, Var "y", Var "z")))
Your answer:
Visualize the tree
Why Modules?
As programs grow, you need ways to organize code, hide implementation details, and write reusable components.
Without modules: name clashes
type sign = Pos | Neg | Zero | Unknownlet sign_join a b = ...let sign_to_string s = ...type taint = Clean | Tainted | TUnknownlet taint_join a b = ...let taint_to_string t = ...(* Name clashes! Both need "join", "to_string", "Unknown" *)
With modules: clean namespaces
module Sign = structtype t = Pos | Neg | Zero | Unknownlet join a b = ...let to_string s = ...endmodule Taint = structtype t = Clean | Tainted | Unknownlet join a b = ...end(* No clashes! *)Sign.join Sign.Pos Sign.NegTaint.join Taint.Clean Taint.Tainted
Key insight: Modules are like "super structs" — they can contain types, values, functions, and even other modules. Each module is its own namespace.
Signatures & Structures
A signature = interface (what). A structure = implementation (how). Sealing hides internals.
Signature (module type)
module type COUNTER = sigtype t (* abstract! *)val create : int -> tval increment : t -> tval value : t -> intend
Structure (sealed by signature)
module SafeCounter : COUNTER = structtype t = { count: int; max: int }let create max = { count=0; max }let increment c =if c.count < c.maxthen { c with count = c.count+1 }else clet value c = c.countend
What the outside world sees
Analogy: A signature is like a Java interface. It says "you must provide these operations" without dictating implementation.
What is a Lattice?
A lattice models "levels of knowledge" — the theoretical foundation of all program analysis in this bootcamp.
Everyday intuition
What color is the next traffic light?Before driving: "I have no idea" = bottomAfter GPS hint: "Red or Yellow" = partial infoAfter seeing it: "Red" = preciseConflicting GPS: "Could be anything" = top
Key rule: Information only flows upward. Once you say "Red or Yellow," you can't go back to "definitely Red" without new evidence.
Why program analysis needs this
Programs have branches (if/else, loops). At merge points, we must combine info from multiple paths. A lattice tells us how to combine safely.
Hasse diagram — click nodes to explore
Click any node to learn about it.
Lattice Operations: Join
Every lattice has bottom, top, and join. Let's explore with the Sign lattice.
The Sign lattice
Pick two values below and click Join to see the result on the diagram.
Interactive join calculator
⊔
Why it works this way
if (cond) { x = 5; // x is Pos} else { x = -3; // x is Neg}// What is x here?// join(Pos, Neg) = ⊤ ("could be either")
Rule:join(a, b) = smallest value above both a and b in the diagram. Same value? Keep it. Different? Go up.
Challenge B: Lattice Join
Given this code, predict the sign of x after the if/else using the Sign lattice.
// Program 1:if (cond) { x = 0; // x is ??? } else { x = 0; // x is ???}// x after merge = ???
Program 1 result:
// Program 2:if (cond) { x = 7; // x is ???} else { x = -2; // x is ???}// x after merge = ???
Program 2 result:
// Program 3:if (cond) { x = 3; // x is ???} else { x = 100; // x is ???}// x after merge = ???
Program 3 result:
Sign Lattice Reference
The LATTICE Signature in OCaml
Now let's encode what we learned as an OCaml module signature. Every analysis domain implements this interface.
The signature
module type LATTICE = sigtype tval bottom : tval top : tval join : t -> t -> tval equal : t -> t -> boolval to_string : t -> stringend
BoolLattice implementation
module BoolLattice : LATTICEwith type t = bool= structtype t = boollet bottom = falselet top = truelet join a b = a || blet equal a b = (a = b)let to_string b =if b then "true" else "false"end
What each part means
Signature item
Purpose
type t
The abstract value type
bottom
Least element (no info / unreachable)
top
Greatest element (could be anything)
join
Merge info from two analysis paths
equal
Check convergence (fixpoint test)
to_string
For debugging / display
Key insight: Different analyses use different types for t — booleans, signs, intervals, taint labels — but they all satisfy the same LATTICE interface.
Foreshadow: Modules 3-5 each define their own LATTICE implementations. The analysis framework code is generic over this interface.
Building a Lattice Module
Let's build a richer lattice — ThreeValueLattice — and test it interactively.
type three_value = | Bot | Zero | Positive | Unknownmodule ThreeValueLattice : LATTICE with type t = three_value= structtype t = three_valuelet bottom = Botlet top = Unknownlet join a b =if a = b then aelse if a = Bot then belse if b = Bot then aelse Unknownlet equal a b = (a = b)let to_string = function | Bot -> "Bot" | Zero -> "Zero" | Positive -> "Positive" | Unknown -> "Unknown"end
Test it: click two nodes to join
Click a node to select Value A, then another for Value B. The join result will be highlighted.
Functors: The Big Picture
A functor = function from modules to modules. Write generic analysis code once, plug in different domains.
The code duplication problem
(* Copy-paste for every domain! *)module SignEnv = structlet lookup env x =match find_opt x env with | Some v -> v | None -> SignLattice.bottomend
The functor solution
(* Write once! *)module MakeEnv (L : LATTICE) = structlet lookup env x =match find_opt x env with | Some v -> v | None -> L.bottomlet join env1 env2 = union (fun _k v1 v2 -> Some (L.join v1 v2)) env1 env2end(* Instantiate for each domain *)module SignEnv = MakeEnv(SignLattice)module TaintEnv = MakeEnv(TaintLattice)
Animated: functor instantiation
You Already Use Functors!
Map.Make and Set.Make from OCaml's standard library are functors. You used them on the Collections slide.
Map.Make — the functor
(* Standard library: *)module Map.Make (Ord : OrderedType) = structtype key = Ord.ttype 'a t = ... (* balanced tree *)val find_opt : key -> 'a t -> 'a optionval add : key -> 'a -> 'a t -> 'a tend(* OrderedType signature: *)module type OrderedType = sigtype tval compare : t -> t -> intend
The bootcamp pattern:
1. Define a LATTICE signature
2. Implement it for each analysis domain
3. Use MakeEnv functor to get environments
4. Write analysis code generic over the interface
Foreshadow: This is exactly lib/abstract_domains/abstract_env.ml. Modules 3-5 plug in sign, interval, and taint domains.
Module System: Quick Reference
Everything you need to know before Exercise 4.
Defining a signature
module type SIG_NAME = sigtype t (* abstract type *)val operation : t -> t -> tend
Implementing (with type exposed)
module Impl : SIG_NAMEwith type t = my_type= structtype t = my_typelet operation a b = ...end
Implementing (type hidden)
module Impl : SIG_NAME = structtype t = my_secret_typelet operation a b = ...end
Defining a functor
module MakeThing (M : SIG_NAME) = struct(* Use M.t, M.operation here *)let do_stuff x = M.operation x M.???end
Using a functor
module ConcreteThing = MakeThing(Impl)(* Now use ConcreteThing.do_stuff *)
Common mistake: Forgetting with type t = ... makes the type abstract. If callers need to create values of type t directly, you must expose it.
Think of it as: Signature = plug shape. Structure = the plug. Functor = an appliance that accepts any plug of that shape.
What is Parsing?
Transforming raw source code into a structured tree — in two stages