Why dominance matters: The immediate dominator of a block is where φ-functions go in SSA form. Dominance is also used to identify natural loops (back edge target dominates source).
Challenge A: Build the CFG
Given this code, select the correct CFG structure for each question.
x = input()
while x > 0:
if x % 2 == 0:
x = x / 2
else:
x = 3*x + 1
print("done")
Q1: How many basic blocks?
Q2: Which block has the back edge?
Q3: What are the predecessors of the print block?
Reachability Analysis — Which Blocks Execute?
Forward flood from ENTRY: mark each reachable block. Unreachable blocks = dead code.
Algorithm
reachable = {ENTRY}
worklist = [ENTRY]
while worklist ≠ ∅:
b = worklist.pop()
for s in succ(b):
if s ∉ reachable:
reachable.add(s)
worklist.push(s)
Key insight: Reachability is the simplest dataflow analysis — the "domain" is just {reachable, unreachable}. It's also the foundation: unreachable blocks can be pruned before running any other analysis.
Dead Code Detection Patterns
Four common patterns of dead code — click each to see why it's unreachable.
Click a pattern to see an example
Forward Dataflow — Reaching Definitions
Which definitions might reach each point? Forward, may-analysis using union at merge points.
IN / OUT Sets
Equations
IN[B] = ∪ OUT[p] for p ∈ pred(B)
OUT[B] = gen[B] ∪ (IN[B] − kill[B])
Backward Dataflow — Live Variables
Which variables might be read before redefinition? Backward, may-analysis using union at successors.
IN / OUT Sets (Backward!)
Equations (note: successors!)
OUT[B] = ∪ IN[s] for s ∈ succ(B)
IN[B] = use[B] ∪ (OUT[B] − def[B])
Forward vs Backward — Side by Side
Same CFG, two analyses running simultaneously. Compare how information flows.
Reaching Defs (Forward →)
Live Variables (← Backward)
Forward: IN = ∪ OUT[pred]
Info flows ENTRY→EXIT. Merge = union of predecessors. Tells us what definitions reach each point.
Backward: OUT = ∪ IN[succ]
Info flows EXIT→ENTRY. Merge = union of successors. Tells us what variables are needed later.
The Dataflow Framework — Unifying Pattern
All four classic analyses follow the same skeleton. Select one to see how the parameters fill in.
Select an analysis to fill in the template
Click an analysis type
The framework is domain-agnostic. Swap the 5 parameters (direction, merge, init, gen, kill) and the same fixpoint engine solves any analysis. This is exactly how your OCaml bootcamp code works!
Challenge B: Trace the Dataflow
Given this CFG with gen/kill sets, predict IN and OUT at iteration 2.
CFG: B1→B2, B1→B3, B2→B4, B3→B4
B1: gen={d1,d2}, kill={}
B2: gen={d3}, kill={d1}
B3: gen={d4}, kill={d2}
B4: gen={}, kill={}
After iteration 1: OUT[B1]={d1,d2}
Q1: What is IN[B4] at iteration 2?
Hint: IN[B4] = OUT[B2] ∪ OUT[B3]
Q2: Why does IN[B4] contain both d2 and d3?
Interprocedural: Call Graphs
When analysis crosses function boundaries, we need a call graph layered on top of the CFG.
Click a function to see its CFG body and call sites.
Key Idea: A call graph edge A→B means "function A may call function B." Combining call graphs with per-function CFGs enables interprocedural dataflow analysis — tracking facts across function boundaries.
Analogy: If each function's CFG is a building floor plan, the call graph is the elevator directory connecting floors.
From CFG to SSA Form
SSA = Static Single Assignment — every variable is defined exactly once. Merge points get φ (phi) functions.
B1: x = 5
if (x > 0) goto B2 else B3
B2: x = x + 1
B3: x = x - 1
B4: print(x) ← which x?
↓ After SSA conversion:
B1: x₁ = 5
if (x₁ > 0) goto B2 else B3
B2: x₂ = x₁ + 1
B3: x₃ = x₁ - 1
B4: x₄ = φ(x₂, x₃)
print(x₄)
Key Takeaways
1. CFGs Make Flow Explicit
Code has hidden control flow (loops, branches, exceptions). The CFG makes every possible execution path visible as edges between basic blocks. No more guessing "can this line reach that line?"
2. Dataflow = Lattice + Transfer + Iteration
Every dataflow analysis follows one pattern: pick a lattice of facts, define transfer functions for each block, and iterate to a fixpoint. Only the lattice and transfer functions change between analyses.
3. Reachability Is the First Analysis
Before any optimization, ask: "Can execution even reach this point?" Forward reachability from entry is the simplest CFG analysis and the foundation for dead code elimination.
4. Dominance Unlocks SSA
If block A dominates block B, every path to B goes through A. This relationship determines where φ-functions go in SSA form — the backbone of modern compiler optimizations.
The City Analogy: A CFG is like a city map. Basic blocks are city blocks. Edges are one-way streets. Reachability asks "can I drive from downtown to the airport?" Dominance asks "must I pass through the toll booth?" Dataflow analysis tracks what's in your trunk as you drive every possible route.
CFGs Across the Bootcamp
Click each module to see where CFG concepts appear.
M3: Dataflow
CFG construction, reaching definitions, live variables
M4: Abstract Interp
Abstract domains evaluated over CFG edges
M5: Taint Analysis
Taint propagation along CFG paths
M6: Constraint-Based
Constraints generated from CFG structure
Labs
Hands-on CFG building, worklist implementation
Advanced
SSA form, interprocedural analysis
Challenge C: Find the Dead Code
Each snippet has dead code. Identify the dead line and the reason.
Snippet 1
def foo(x):
if x > 0:
return x
else:
return -x
print("done") # line 6
Snippet 2
x = 10
while x > 100:
x = x - 1
print(x)
print("end")
Snippet 3
def bar():
x = compute()
y = x + 1
return x
Snippet 4
def baz(x):
if x > 0:
y = 1
if x > 0:
print(y)
else:
print(0)
Quiz: CFG Properties
Q1: Basic Blocks
A basic block has the property that:
Q2: Back Edges
A back edge in a CFG indicates:
Q3: Dominance
If block A dominates block B, then:
Quiz: Dataflow Trace
Given this CFG and reaching definitions state after iteration 1, predict IN[B3] after iteration 2.
After iteration 1:
OUT[B1] = {d1: x=5}
OUT[B2] = {d1: x=5, d2: y=x+1}
OUT[B3] = {d1: x=5, d3: x=y*2}
B3 preds: B2, B4 (back edge)
OUT[B4] = {d1: x=5, d3: x=y*2, d4: y=0}
What is IN[B3] after iteration 2?
Quiz: Analysis Design
For each scenario, choose the correct dataflow analysis type.
Scenario 1
"Which variable assignments might reach this point?"
Scenario 2
"Which variables are definitely used after this point on some path?"
Scenario 3
"Which expressions are guaranteed already computed at this point?"
Scenario 4
"Which expressions are guaranteed to be used on all paths from this point?"