Apply AST transformations (constant folding, renaming, dead code elimination)
Key Idea: Every program analysis tool — linters, compilers, security scanners — starts by converting source code into an AST. This module teaches you to build and manipulate that core data structure yourself.
Why ASTs? From Text to Structure
Raw source code is flat text. Click Step to see how it becomes a structured tree.
source code
(2 + 3) * 4
Abstract Syntax Tree:
Each node = a programming construct
Edges = containment/composition
Syntactic noise (parens, semicolons) removed
Semantic structure preserved
Analogy: Source code is like a sentence. The AST is like a sentence diagram — it shows the grammatical structure, not the raw characters.
Press Step to build the AST...
Precedence in the Tree
Toggle between two expressions to see how parentheses and operator precedence change the tree shape.
Key Insight: The root operator is evaluated last. Higher-precedence operators appear lower in the tree. Parentheses override precedence by restructuring the tree.
AST Node Categories
Click each category to see its node types and examples.
Click a category
AST nodes fall into three main categories, each serving a different role in the program structure.
AST vs Parse Tree
Toggle to see how the same expression looks as a parse tree vs an AST.
Parse Tree (Concrete Syntax Tree)
Abstract Syntax Tree
Feature
Parse Tree
AST
Parentheses
Included
Removed
Grammar rules
All intermediate nodes
Only semantic nodes
Size
Larger (11 nodes)
Compact (5 nodes)
Used for
Parsing stage
Analysis & transformation
Building ASTs in OCaml
Step through the OCaml type definitions and see how they map to tree nodes.
ast_types.ml
1typeexpr =
2 | IntLitof int
3 | BoolLitof bool
4 | Varof string
5 | BinOpof op * expr * expr
6 | UnaryOpof uop * expr
7 | Callof string * expr list
8
9typestmt =
10 | Assignof string * expr
11 | Ifof expr * stmt list * stmt list
12 | Whileof expr * stmt list
13 | Returnof expr option
Step through: if x > 0 then y = x + 1
Pre-order Traversal (Top-Down)
Visit node FIRST, then children. Step through to see the order on (2 + 3) * 4.
pre_order.ml
let recpre_order node =
node.value :: (* visit node FIRST *)
List.concat_map pre_order
node.children (* then children *)
Visit order:
Pre-order: visit self, then recurse left, then right...
Use case: Top-down analyses — type checking (check parent type before children), scope entry (enter scope before visiting body).
Challenge: Predict the Traversal
Given this AST for x + y * z, type the pre-order visit sequence.
Pre-order rule: Visit node, then left child, then right child.
Pre-order trace:
1. Visit + (root)
2. Go left → Visit x (leaf)
3. Go right → Visit *
4. Go left → Visit y (leaf)
5. Go right → Visit z (leaf) Result: +, x, *, y, z
Post-order Traversal (Bottom-Up)
Visit children FIRST, then node. Step through to see bottom-up evaluation order.
post_order.ml
let recpost_order node =
List.concat_map post_order
node.children (* children FIRST *)
@ [node.value] (* then self *)
Visit order:
Post-order: recurse left, recurse right, then visit self...
Use case: Bottom-up analyses — expression evaluation (compute children before parent), code generation, computing expression types.
BFS / Level-order Traversal
Visit all nodes at depth d before depth d+1. Watch the queue drive the traversal.
Queue:
Visit order:
bfs.ml
letbfs root =
let q = Queue.create () in
Queue.push root q;
while not (Queue.is_empty q) do
let cur = Queue.pop q in
visit cur; (* process *)
List.iter (Queue.push q) cur.children
done
Use case: Level-based analysis, finding the shallowest occurrence of a pattern, pretty-printing by depth.
Pattern Matching Over ASTs
In OCaml, pattern matching replaces the visitor pattern. Step through a node counter.
node_counter.ml
let reccount_expr = function
| IntLit _ -> inc "IntLit"
| Var _ -> inc "Var"
| BinOp (_, l, r) ->
inc "BinOp";
count_expr l; count_expr r
| _ -> ()
Why pattern matching? Add new analyses by writing new match functions — no class hierarchy needed. This is OCaml's superpower for program analysis.
BinOp 0
IntLit 0
Var 0
Counting nodes in: x + (2 * y)
Traversal State Management
Complex analyses pass context (like scope) through parameters as they traverse.
scope_analysis.ml
let recanalyze_stmt scope = function
| Assign (name, expr) ->
let scope' = define scope name in
analyze_expr scope' expr
| If (cond, then_b, else_b) ->
let inner = enter_scope scope in
List.iter (analyze_stmt inner) then_b;
List.iter (analyze_stmt inner) else_b
Scope Stack
Global: { }
Analyzing: x = 5; if x > 0 then y = x + 1
Key pattern: The scope is threaded through as a parameter — never mutated globally. This makes the analysis safe and composable.
Challenge: Which Traversal?
Match each use case to the best traversal strategy.
1. Evaluate an arithmetic expression tree — need child values before computing parent.
2. Enter a scope before analyzing the statements inside a function body.
3. Find the shallowest node matching a pattern (e.g., first return at minimum depth).
Explanations: 1. Post-order — To compute `2+3`, you need values of 2 and 3 first. Bottom-up evaluation computes children before parents. 2. Pre-order — You must set up the scope BEFORE visiting the body. Top-down processing handles parent context first. 3. BFS — Level-order visits all nodes at depth d before d+1, guaranteeing the first match is the shallowest.
Symbol Tables & Scope Chains
Step through nested scopes. Watch the scope chain grow and shrink as we enter/exit blocks.
scoping.ml
let x = 10(* global *)
letfoo () =
let y = 20in(* foo scope *)
letbar () =
let z = 30in(* bar scope *)
x + y + z (* lookup! *)
in bar ()
Building scope chain...
Shadowing & Scope Resolution
When an inner scope declares a name that exists in an outer scope, the inner one shadows the outer.
shadowing.ml
let x = 10(* global *)
letfoo () =
let x = 20in(* shadows! *)
print_int x (* which x? *)
let () = print_int x (* which x? *)
Common confusion: Shadowing does NOT modify the outer variable. It creates a new binding that hides the old one within the inner scope.
Tracing variable resolution...
AST Transformation: Constant Folding
Replace constant sub-expressions with their computed values. Watch the tree shrink.
constant_fold.ml
let recfold = function
| BinOp(op, l, r) ->
match fold l, fold r with
| IntLit a, IntLit b ->
IntLit (eval op a b)
| l', r' -> BinOp(op, l', r')
| e -> e (* leaves unchanged *)
Folding: (2 + 3) * 4
Why post-order? Constant folding uses bottom-up traversal — fold children first, then check if the parent can be collapsed. This is why we learned post-order!
Renaming & Dead Code Elimination
Two more essential transformations. Toggle to see each in action.
Before
After
Challenge: Apply Constant Folding
Given this AST, apply constant folding and type the final result.
Expression: (10 - 3) + (2 * 4)
After folding all constants, what single IntLit value remains?
1. ASTs provide structured representations of code — converting flat text into a tree that captures the program's logical organization without syntactic noise.
2. Different traversals for different needs: Pre-order (top-down: scope entry, type checking), Post-order (bottom-up: evaluation, folding), BFS (level-based: shallowest match).
3. Symbol tables track identifiers across nested scopes using scope chains. Lookup walks up the chain. Shadowing creates new bindings without modifying outer scopes.
5. OCaml's pattern matching is the perfect tool for AST manipulation — each analysis is just a new recursive match function. No visitor classes needed.
6. Transformation safety: Always preserve semantics. Use immutable updates, respect scope boundaries, and apply transformations in the right order.
Next Module Preview
Module 3: Static Analysis Fundamentals
Control Flow Graphs (CFGs) — how programs branch and loop
Dataflow analysis framework — tracking information through code
Reaching definitions & live variables
Building your first static analyzer
Prep: Review set theory (union, intersection) and basic graph theory. If you know what a directed graph is, you're ready.
From ASTs to CFGs
Module 2 gave you the tree. Module 3 flattens it into a graph that shows how execution flows through the program.
AST (tree): If
├── condition
├── then-body
└── else-body
Office Hours: By appointment, HH227 Resources: Course GitHub repository
Quiz 1: Core Concepts
Select one answer per question, then click Check Answers.
Q1. What does an AST remove compared to a parse tree?
Q2. Which traversal evaluates an expression tree correctly?
Q3. When looking up a variable, the symbol table:
Quiz 2: Build the AST
Given source code, select the correct AST representation.
expression
a * (b + 1)
Which OCaml AST is correct?
Quiz 3: Transformation Reasoning
For each scenario, predict what happens after the transformation.
1. After constant folding BinOp(Add, IntLit 5, BinOp(Mul, IntLit 0, Var "x")), what's left?
2. Renaming "x"→"y" in [Assign("x", IntLit 1); Print [Var "x"]]. How many nodes change?
3. Dead code elim on [If(BoolLit false, [s1], [s2])]. What remains?
Explanations: 1. Unchanged — 0 * Var "x" can't fold because Var "x" is not an IntLit. The folder only collapses IntLit op IntLit. A smarter optimizer could recognize 0 * anything = 0, but basic constant folding doesn't. 2. Two nodes — The Assign("x", ...) has the string "x" in it (declaration), and Var "x" has a reference. Both must be renamed to "y". 3. Just s2 — The condition is false, so the then-branch (s1) is dead. Only the else-branch (s2) survives.