🌳

Code Representation and ASTs

Module 2 — Program Analysis Bootcamp

Instructor: Weihao  |  Office Hours: By appointment, HH227

Use arrow keys or buttons to navigate

Learning Objectives

By the end of this module, you will be able to:

Key Idea: Every program analysis tool — linters, compilers, security scanners — starts by converting source code into an AST. This module teaches you to build and manipulate that core data structure yourself.

Why ASTs? From Text to Structure

Raw source code is flat text. Click Step to see how it becomes a structured tree.

source code
(2 + 3) * 4

Abstract Syntax Tree:

  • Each node = a programming construct
  • Edges = containment/composition
  • Syntactic noise (parens, semicolons) removed
  • Semantic structure preserved
Analogy: Source code is like a sentence. The AST is like a sentence diagram — it shows the grammatical structure, not the raw characters.
Press Step to build the AST...

Precedence in the Tree

Toggle between two expressions to see how parentheses and operator precedence change the tree shape.

Key Insight: The root operator is evaluated last. Higher-precedence operators appear lower in the tree. Parentheses override precedence by restructuring the tree.

AST Node Categories

Click each category to see its node types and examples.

Click a category

AST nodes fall into three main categories, each serving a different role in the program structure.

AST vs Parse Tree

Toggle to see how the same expression looks as a parse tree vs an AST.

Parse Tree (Concrete Syntax Tree)

Abstract Syntax Tree

FeatureParse TreeAST
ParenthesesIncludedRemoved
Grammar rulesAll intermediate nodesOnly semantic nodes
SizeLarger (11 nodes)Compact (5 nodes)
Used forParsing stageAnalysis & transformation

Building ASTs in OCaml

Step through the OCaml type definitions and see how they map to tree nodes.

ast_types.ml
1 type expr =
2 | IntLit of int
3 | BoolLit of bool
4 | Var of string
5 | BinOp of op * expr * expr
6 | UnaryOp of uop * expr
7 | Call of string * expr list
8
9 type stmt =
10 | Assign of string * expr
11 | If of expr * stmt list * stmt list
12 | While of expr * stmt list
13 | Return of expr option
Step through: if x > 0 then y = x + 1

Pre-order Traversal (Top-Down)

Visit node FIRST, then children. Step through to see the order on (2 + 3) * 4.

pre_order.ml
let rec pre_order node =
node.value :: (* visit node FIRST *)
List.concat_map pre_order
node.children (* then children *)
Visit order:
Pre-order: visit self, then recurse left, then right...
Use case: Top-down analyses — type checking (check parent type before children), scope entry (enter scope before visiting body).

Challenge: Predict the Traversal

Given this AST for x + y * z, type the pre-order visit sequence.

Pre-order rule: Visit node, then left child, then right child.

Post-order Traversal (Bottom-Up)

Visit children FIRST, then node. Step through to see bottom-up evaluation order.

post_order.ml
let rec post_order node =
List.concat_map post_order
node.children (* children FIRST *)
@ [node.value] (* then self *)
Visit order:
Post-order: recurse left, recurse right, then visit self...
Use case: Bottom-up analyses — expression evaluation (compute children before parent), code generation, computing expression types.

BFS / Level-order Traversal

Visit all nodes at depth d before depth d+1. Watch the queue drive the traversal.

Queue:
Visit order:
bfs.ml
let bfs root =
let q = Queue.create () in
Queue.push root q;
while not (Queue.is_empty q) do
let cur = Queue.pop q in
visit cur; (* process *)
List.iter (Queue.push q) cur.children
done
Use case: Level-based analysis, finding the shallowest occurrence of a pattern, pretty-printing by depth.

Pattern Matching Over ASTs

In OCaml, pattern matching replaces the visitor pattern. Step through a node counter.

node_counter.ml
let rec count_expr = function
| IntLit _ -> inc "IntLit"
| Var _ -> inc "Var"
| BinOp (_, l, r) ->
inc "BinOp";
count_expr l; count_expr r
| _ -> ()
Why pattern matching? Add new analyses by writing new match functions — no class hierarchy needed. This is OCaml's superpower for program analysis.
BinOp
0
IntLit
0
Var
0
Counting nodes in: x + (2 * y)

Traversal State Management

Complex analyses pass context (like scope) through parameters as they traverse.

scope_analysis.ml
let rec analyze_stmt scope = function
| Assign (name, expr) ->
let scope' = define scope name in
analyze_expr scope' expr
| If (cond, then_b, else_b) ->
let inner = enter_scope scope in
List.iter (analyze_stmt inner) then_b;
List.iter (analyze_stmt inner) else_b

Scope Stack

Global: { }
Analyzing: x = 5; if x > 0 then y = x + 1
Key pattern: The scope is threaded through as a parameter — never mutated globally. This makes the analysis safe and composable.

Challenge: Which Traversal?

Match each use case to the best traversal strategy.

1. Evaluate an arithmetic expression tree — need child values before computing parent.

2. Enter a scope before analyzing the statements inside a function body.

3. Find the shallowest node matching a pattern (e.g., first return at minimum depth).

Symbol Tables & Scope Chains

Step through nested scopes. Watch the scope chain grow and shrink as we enter/exit blocks.

scoping.ml
let x = 10 (* global *)
let foo () =
let y = 20 in (* foo scope *)
let bar () =
let z = 30 in (* bar scope *)
x + y + z (* lookup! *)
in bar ()
Building scope chain...

Shadowing & Scope Resolution

When an inner scope declares a name that exists in an outer scope, the inner one shadows the outer.

shadowing.ml
let x = 10 (* global *)
let foo () =
let x = 20 in (* shadows! *)
print_int x (* which x? *)
let () = print_int x (* which x? *)
Common confusion: Shadowing does NOT modify the outer variable. It creates a new binding that hides the old one within the inner scope.
Tracing variable resolution...

AST Transformation: Constant Folding

Replace constant sub-expressions with their computed values. Watch the tree shrink.

constant_fold.ml
let rec fold = function
| BinOp(op, l, r) ->
match fold l, fold r with
| IntLit a, IntLit b ->
IntLit (eval op a b)
| l', r' -> BinOp(op, l', r')
| e -> e (* leaves unchanged *)
Folding: (2 + 3) * 4
Why post-order? Constant folding uses bottom-up traversal — fold children first, then check if the parent can be collapsed. This is why we learned post-order!

Renaming & Dead Code Elimination

Two more essential transformations. Toggle to see each in action.

Before
After

Challenge: Apply Constant Folding

Given this AST, apply constant folding and type the final result.

Expression: (10 - 3) + (2 * 4)

After folding all constants, what single IntLit value remains?

Key Takeaways

1. ASTs provide structured representations of code — converting flat text into a tree that captures the program's logical organization without syntactic noise.
2. Different traversals for different needs: Pre-order (top-down: scope entry, type checking), Post-order (bottom-up: evaluation, folding), BFS (level-based: shallowest match).
3. Symbol tables track identifiers across nested scopes using scope chains. Lookup walks up the chain. Shadowing creates new bindings without modifying outer scopes.
4. AST transformations enable automated optimization and refactoring: constant folding (simplify expressions), variable renaming (consistent updates), dead code elimination (remove unreachable code).
5. OCaml's pattern matching is the perfect tool for AST manipulation — each analysis is just a new recursive match function. No visitor classes needed.
6. Transformation safety: Always preserve semantics. Use immutable updates, respect scope boundaries, and apply transformations in the right order.

Next Module Preview

Module 3: Static Analysis Fundamentals

  • Control Flow Graphs (CFGs) — how programs branch and loop
  • Dataflow analysis framework — tracking information through code
  • Reaching definitions & live variables
  • Building your first static analyzer
Prep: Review set theory (union, intersection) and basic graph theory. If you know what a directed graph is, you're ready.

From ASTs to CFGs

Module 2 gave you the tree. Module 3 flattens it into a graph that shows how execution flows through the program.

AST (tree):
  If
  ├── condition
  ├── then-body
  └── else-body

CFG (graph):
  [cond] ──T──→ [then] ──→ [join]
    └──F──→ [else] ──→ [join]

Office Hours: By appointment, HH227
Resources: Course GitHub repository

Quiz 1: Core Concepts

Select one answer per question, then click Check Answers.

Q1. What does an AST remove compared to a parse tree?

Q2. Which traversal evaluates an expression tree correctly?

Q3. When looking up a variable, the symbol table:

Quiz 2: Build the AST

Given source code, select the correct AST representation.

expression
a * (b + 1)

Which OCaml AST is correct?

Quiz 3: Transformation Reasoning

For each scenario, predict what happens after the transformation.

1. After constant folding BinOp(Add, IntLit 5, BinOp(Mul, IntLit 0, Var "x")), what's left?

2. Renaming "x"→"y" in [Assign("x", IntLit 1); Print [Var "x"]]. How many nodes change?

3. Dead code elim on [If(BoolLit false, [s1], [s2])]. What remains?