Control Flow Graphs

& Reachability Analysis

Fundamentals Deep Dive • PA Bootcamp

Why CFGs? From Code to Graph

Source code is linear text, but execution is a graph. CFGs make the flow explicit.

Source Code (click a line)
x = input()
if x > 0:
y = x * 2
else:
y = -x
print(y)
Analogy: Code is like driving directions (turn left, then right...). A CFG is like a road map — it shows ALL possible routes at once.
Path Explosion:
This tiny program has 2 paths. Add 10 more if-statements → 210 = 1024 paths. CFGs + dataflow let us reason about ALL paths simultaneously.

Basic Blocks — The Building Units

A basic block is a maximal sequence of consecutive statements with one entry point and one exit point.

Rules for Block Boundaries
Start a new block at:
• The first statement of the program
• Any target of a branch (jump destination)
• Any statement immediately after a branch
End a block at:
• A branch instruction (if, while, goto)
• A return statement
• The last statement before a branch target
Key property: If the first statement of a block executes, ALL statements in that block execute, in order. No surprises inside a basic block.

CFG Patterns: Sequential & If-Else

The two simplest patterns — click to see each constructed from code.

CFG Patterns: While Loops & Back Edges

Loops create back edges — edges that go "upstream" in the CFG, creating cycles.

i = 0
while i < 10:
sum = sum + i
i = i + 1
print(sum)
Back edge B3→B2 creates a cycle. This is why dataflow needs iterative fixpoint — information loops back until it stabilizes.

CFG Patterns: Nested & Complex

Real code nests these patterns. Click any block to see its predecessors and successors.

x = input() // B1
if x > 0: // B2
while x > 1: // B3
x = x / 2 // B4
print("pos") // B5
else:
print("neg") // B6
return x // B7
Click a node to see pred/succ
Nested pattern: The while loop (B3↔B4) is inside the if-else (B2→B3/B6). The merge point B7 collects both branches.

Predecessors & Successors

The foundation of dataflow: information flows along edges. pred(B) = where info comes from, succ(B) = where it goes.

Pred/Succ Table
Blockpred(B)succ(B)
Forward analysis: IN[B] = ⊔ { OUT[p] | p ∈ pred(B) }
Merge information from ALL predecessors.
Backward analysis: OUT[B] = ⊔ { IN[s] | s ∈ succ(B) }
Merge information from ALL successors.

Dominance — Who Controls Whom?

Block A dominates block B if every path from ENTRY to B passes through A. Click a node to see its dominators.

Click a node to see which blocks dominate it
Dominator Tree
ENTRY
├── B1
│   ├── B2
│   ├── B3
│   └── B4
│       └── EXIT
Why dominance matters: The immediate dominator of a block is where φ-functions go in SSA form. Dominance is also used to identify natural loops (back edge target dominates source).

Challenge A: Build the CFG

Given this code, select the correct CFG structure for each question.

x = input()
while x > 0:
if x % 2 == 0:
x = x / 2
else:
x = 3*x + 1
print("done")
Q1: How many basic blocks?
Q2: Which block has the back edge?
Q3: What are the predecessors of the print block?

Reachability Analysis — Which Blocks Execute?

Forward flood from ENTRY: mark each reachable block. Unreachable blocks = dead code.

Algorithm
reachable = {ENTRY}
worklist = [ENTRY]
while worklist ≠ ∅:
b = worklist.pop()
for s in succ(b):
if s ∉ reachable:
reachable.add(s)
worklist.push(s)
Key insight: Reachability is the simplest dataflow analysis — the "domain" is just {reachable, unreachable}. It's also the foundation: unreachable blocks can be pruned before running any other analysis.

Dead Code Detection Patterns

Four common patterns of dead code — click each to see why it's unreachable.

Click a pattern to see an example

Forward Dataflow — Reaching Definitions

Which definitions might reach each point? Forward, may-analysis using union at merge points.

IN / OUT Sets
Equations
IN[B] = ∪ OUT[p] for p ∈ pred(B)
OUT[B] = gen[B] ∪ (IN[B] − kill[B])

Backward Dataflow — Live Variables

Which variables might be read before redefinition? Backward, may-analysis using union at successors.

IN / OUT Sets (Backward!)
Equations (note: successors!)
OUT[B] = ∪ IN[s] for s ∈ succ(B)
IN[B] = use[B] ∪ (OUT[B] − def[B])

Forward vs Backward — Side by Side

Same CFG, two analyses running simultaneously. Compare how information flows.

Reaching Defs (Forward →)
Live Variables (← Backward)
Forward: IN = ∪ OUT[pred]
Info flows ENTRY→EXIT. Merge = union of predecessors. Tells us what definitions reach each point.
Backward: OUT = ∪ IN[succ]
Info flows EXIT→ENTRY. Merge = union of successors. Tells us what variables are needed later.

The Dataflow Framework — Unifying Pattern

All four classic analyses follow the same skeleton. Select one to see how the parameters fill in.

Select an analysis to fill in the template
Click an analysis type
The framework is domain-agnostic. Swap the 5 parameters (direction, merge, init, gen, kill) and the same fixpoint engine solves any analysis. This is exactly how your OCaml bootcamp code works!

Challenge B: Trace the Dataflow

Given this CFG with gen/kill sets, predict IN and OUT at iteration 2.

CFG: B1→B2, B1→B3, B2→B4, B3→B4
B1: gen={d1,d2}, kill={}
B2: gen={d3}, kill={d1}
B3: gen={d4}, kill={d2}
B4: gen={}, kill={}
After iteration 1: OUT[B1]={d1,d2}
Q1: What is IN[B4] at iteration 2?
Hint: IN[B4] = OUT[B2] ∪ OUT[B3]
Q2: Why does IN[B4] contain both d2 and d3?

Interprocedural: Call Graphs

When analysis crosses function boundaries, we need a call graph layered on top of the CFG.

Click a function to see its CFG body and call sites.
Key Idea: A call graph edge A→B means "function A may call function B." Combining call graphs with per-function CFGs enables interprocedural dataflow analysis — tracking facts across function boundaries.
Analogy: If each function's CFG is a building floor plan, the call graph is the elevator directory connecting floors.

From CFG to SSA Form

SSA = Static Single Assignment — every variable is defined exactly once. Merge points get φ (phi) functions.

B1: x = 5
if (x > 0) goto B2 else B3
B2: x = x + 1
B3: x = x - 1
B4: print(x) ← which x?
↓ After SSA conversion:
B1: x₁ = 5
if (x₁ > 0) goto B2 else B3
B2: x₂ = x₁ + 1
B3: x₃ = x₁ - 1
B4: x₄ = φ(x₂, x₃)
print(x₄)

Key Takeaways

1. CFGs Make Flow Explicit
Code has hidden control flow (loops, branches, exceptions). The CFG makes every possible execution path visible as edges between basic blocks. No more guessing "can this line reach that line?"
2. Dataflow = Lattice + Transfer + Iteration
Every dataflow analysis follows one pattern: pick a lattice of facts, define transfer functions for each block, and iterate to a fixpoint. Only the lattice and transfer functions change between analyses.
3. Reachability Is the First Analysis
Before any optimization, ask: "Can execution even reach this point?" Forward reachability from entry is the simplest CFG analysis and the foundation for dead code elimination.
4. Dominance Unlocks SSA
If block A dominates block B, every path to B goes through A. This relationship determines where φ-functions go in SSA form — the backbone of modern compiler optimizations.
The City Analogy: A CFG is like a city map. Basic blocks are city blocks. Edges are one-way streets. Reachability asks "can I drive from downtown to the airport?" Dominance asks "must I pass through the toll booth?" Dataflow analysis tracks what's in your trunk as you drive every possible route.

CFGs Across the Bootcamp

Click each module to see where CFG concepts appear.

M3: Dataflow
CFG construction, reaching definitions, live variables
M4: Abstract Interp
Abstract domains evaluated over CFG edges
M5: Taint Analysis
Taint propagation along CFG paths
M6: Constraint-Based
Constraints generated from CFG structure
Labs
Hands-on CFG building, worklist implementation
Advanced
SSA form, interprocedural analysis

Challenge C: Find the Dead Code

Each snippet has dead code. Identify the dead line and the reason.

Snippet 1
def foo(x):
if x > 0:
return x
else:
return -x
print("done") # line 6
Snippet 2
x = 10
while x > 100:
x = x - 1
print(x)
print("end")
Snippet 3
def bar():
x = compute()
y = x + 1
return x
Snippet 4
def baz(x):
if x > 0:
y = 1
if x > 0:
print(y)
else:
print(0)

Quiz: CFG Properties

Q1: Basic Blocks

A basic block has the property that:

Q2: Back Edges

A back edge in a CFG indicates:

Q3: Dominance

If block A dominates block B, then:

Quiz: Dataflow Trace

Given this CFG and reaching definitions state after iteration 1, predict IN[B3] after iteration 2.

After iteration 1:
OUT[B1] = {d1: x=5}
OUT[B2] = {d1: x=5, d2: y=x+1}
OUT[B3] = {d1: x=5, d3: x=y*2}
B3 preds: B2, B4 (back edge)
OUT[B4] = {d1: x=5, d3: x=y*2, d4: y=0}
What is IN[B3] after iteration 2?

Quiz: Analysis Design

For each scenario, choose the correct dataflow analysis type.

Scenario 1

"Which variable assignments might reach this point?"

Scenario 2

"Which variables are definitely used after this point on some path?"

Scenario 3

"Which expressions are guaranteed already computed at this point?"

Scenario 4

"Which expressions are guaranteed to be used on all paths from this point?"