Fundamentals Deep Dive

Worklist Algorithms

The engine that drives every dataflow analysis to its fixed point

Round-Robin Worklist Queue Reverse Postorder Convergence

The Problem: Fixed Points on CFGs

We need to compute IN/OUT sets for every block in a CFG. The challenge: blocks depend on each other, especially in loops.

The dependency problem:

• OUT[B2] depends on IN[B2]
• IN[B2] = OUT[B1] ⊔ OUT[B4]
• OUT[B4] depends on IN[B4]
• IN[B4] = OUT[B3]
• OUT[B3] depends on IN[B3]
• IN[B3] = OUT[B2] ← circular!

Analogy: Imagine a spreadsheet where cell A1 references B1, B1 references C1, and C1 references A1. You can't compute any cell in one pass — you need to iterate until all cells stabilize.

Solution: Start with ⊥ everywhere, then iteratively recompute until nothing changes. The question is: in what order do we process blocks?

Naive Round-Robin Iteration

The simplest approach: scan all blocks every round. Stop when no OUT set changes. Simple but wasteful.

Round: 0 | Processed: 0

let round_robin cfg =
  init_all_to_bottom cfg;
  let changed = ref true in
  while !changed do
    changed := false;
    List.iter (fun b ->
      let new_in = merge preds(b) in
      let new_out = transfer b new_in in
      if new_out ≠ out[b] then
        changed := true;
      out[b] := new_out
    ) cfg.blocks
  done

Wasted work: In each round, we recompute blocks whose inputs didn't change. If only B3's output changed, why reprocess B1 and B2?

The Worklist Idea

Instead of scanning all blocks, maintain a queue of "dirty" blocks — blocks whose inputs may have changed. Only process what's needed.

Round-Robin

Process ALL blocks every round
Many blocks unchanged = wasted
O(h × n) per round, n rounds worst
Simple to implement

Worklist

Process only "dirty" blocks
Skip stable blocks entirely
O(h × e) total work
Slightly more complex

Analogy: Round-robin is like a teacher grading ALL exams every day. Worklist is like only re-grading exams that students resubmitted. Same final grades, much less work.

Core invariant: A block is on the worklist if and only if at least one of its predecessors' OUT sets has changed since we last processed it. When the worklist is empty, we've reached the fixed point.

Worklist Algorithm in Code

Initialize worklist with all blocks. Pop a block, process it, and if its OUT changed, add its successors to the worklist.

let worklist_solve cfg =
  init_all_to_bottom cfg;
  let wl = Queue.create () in
  List.iter (Queue.push wl) cfg.blocks;
  while not (Queue.is_empty wl) do
    let b = Queue.pop wl in
    let new_in = merge preds(b) in
    let new_out = transfer b new_in in
    if new_out ≠ out[b] then begin
      out[b] := new_out;
      List.iter (Queue.push wl)
        (succs b)
    end
  done

Key differences from round-robin:

Line 5: Pop one block (not iterate all)
Line 8: Only act if OUT actually changed
Lines 10-11: Only add successors (the blocks affected by this change)
Line 4: Empty worklist = fixed point (no more dirty blocks)

Why add successors? If OUT[B] changed, then any block C where B → C has a new input. C needs to be reprocessed. Blocks NOT downstream of B are unaffected — skip them.

Duplicate prevention: Many implementations check if the successor is already on the worklist before adding it. This avoids redundant processing.

Forward vs Backward on the Worklist

The worklist algorithm works for both directions — just swap which neighbors get added when a block changes.

Forward (Reaching Defs)

• IN[B] = ⊔ { OUT[p] | p ∈ preds(B) }
• OUT[B] = transfer(B, IN[B])
• If OUT changed → add successors to WL

Backward (Live Variables)

• OUT[B] = ⊔ { IN[s] | s ∈ succs(B) }
• IN[B] = transfer(B, OUT[B])
• If IN changed → add predecessors to WL

(* Forward: add succs *)
if new_out ≠ out[b] then
  out[b] := new_out;
  succs(b) |> add_to_worklist

(* Backward: add preds *)
if new_in ≠ in_[b] then
  in_[b] := new_in;
  preds(b) |> add_to_worklist

Worked Example: Reaching Defs Worklist

Watch the worklist algorithm compute reaching definitions. Compare the work done vs round-robin.

Worklist:

IN / OUT Table:

Tracking Convergence

How many blocks does each approach process? The worklist avoids wasted work on stable blocks.

Comparison on a typical 6-block CFG with 1 loop:

Metric	Round-Robin	Worklist
Blocks processed	18	8
Unchanged (wasted)	10	0
Rounds	3	—
Wasted work	56%	0%

Key Insight: The worklist processes exactly the blocks that need recomputing — no more, no less. On large CFGs (thousands of blocks), this difference is dramatic.

Complexity:
Round-robin: O(h × n²) worst case
Worklist: O(h × |E|) where |E| = edges
h = lattice height, n = blocks

🎯 Challenge A: Predict the Worklist

Given the CFG below, answer each question about what happens during worklist iteration.

Forward analysis. Edges: B1→B2, B1→B3, B2→B4, B3→B4, B4→B2 (back)

Q1: OUT[B1] changes. Which blocks get added to the worklist?

Q2: OUT[B4] changes. Which blocks get added?

Q3: We process B3 and its OUT does NOT change. What happens?

Worklist Order Matters

Same algorithm, same result — but different processing orders lead to different amounts of work. Compare FIFO vs LIFO.

FIFO (Queue) — process in order added

Steps: 0

LIFO (Stack) — process most recent first

Steps: 0

Observation: FIFO tends to process blocks in a breadth-first order — natural for forward analysis. LIFO goes depth-first — can propagate information deeper faster but may revisit blocks. Neither is universally better — the optimal order depends on the CFG shape.

Reverse Postorder (RPO)

The optimal traversal order for forward analysis. RPO visits each node after all its predecessors (except back edges) — process definitions before uses.

How to compute RPO:

Run DFS from entry node
Record post-order: when a node finishes (all children done)
Reverse the post-order list
Use this order for the worklist

Post-order: —
RPO: —

Why RPO? For acyclic parts of the CFG, RPO processes a block only after all its inputs are computed — one pass suffices. Only loops require re-iteration.

RPO vs FIFO: Same CFG, Less Work

Same reaching defs analysis, same result. RPO needs fewer steps because it processes blocks in dependency order.

FIFO Order: B1, B2, B3, B4

RPO Order: B1, B3, B2, B4

RPO advantage: On acyclic CFGs, RPO computes the fixed point in one pass. With loops, RPO still minimizes re-processing because definitions reach uses before uses are analyzed.

Chaotic Iteration

The theoretical foundation: any "fair" ordering converges to the same fixed point — even random! RPO is just the smartest choice.

Chaotic Iteration Theorem:

For monotone transfer functions on a lattice with ACC, any iteration strategy that is fair (every block gets processed infinitely often if it stays on the worklist) will converge to the same least fixed point.

Fair = don't starve any block forever.

All strategies find the same answer. The difference is only in how many steps it takes. RPO minimizes steps; random is worst on average.

Handling Loops (Back Edges)

Loops cause back edges in the CFG. These are the only edges that require re-processing — and where widening connects.

Back Edge Detection:
An edge A → B is a back edge if B was visited before A in DFS (B dominates A).

Impact on worklist:
• Back edges put the loop header back on the worklist
• Each loop iteration grows the analysis state
• With finite lattice: terminates after ≤ height iterations per loop
• With infinite lattice: needs widening at loop headers

Finite lattice
Powerset height = |defs|
Loop re-iterates at most |defs| times
No widening needed

Infinite lattice
Interval height = ∞
Loop re-iterates forever
Apply widening at header

Complexity Analysis

How much work does the worklist algorithm do? It depends on lattice height, CFG edges, and traversal order.

Complexity Formulas:

Round-Robin	O(h × n²)
Worklist (FIFO)	O(h × \|E\|)
Worklist (RPO)	O(h × \|E\|) but fewer constant

h = lattice height, n = blocks, |E| = edges

Interactive Calculator:

Blocks (n): Edges (|E|): Height (h):

🎯 Challenge B: Which Order Is Best?

For each CFG shape, pick the best worklist strategy.

CFG 1: Linear Chain

B1 → B2 → B3 → B4 → B5 (no loops)

CFG 2: Diamond with Back Edge

B1→{B2,B3}→B4→B2 (loop on left branch)

CFG 3: Nested Loops

Outer loop (B1→B2→B1) with inner loop (B2→B3→B2)

CFG 4: Backward Analysis (Live Variables)

Same CFG, but propagating information backwards

Applying Worklist to Live Variables (Backward)

Worklist works for backward analyses too — just swap successors ↔ predecessors and IN ↔ OUT.

Live Variable Sets

Worklist Queue

Log

Key difference: In backward analysis, when a block's OUT changes, we add its predecessors to the worklist (they need to recompute their IN).

Real-World Worklist Implementations

How production tools implement worklist iteration — click each to explore.

Click a tool to see its worklist strategy

Key Takeaways

1. Worklist = Targeted Iteration
Instead of blindly re-analyzing every block, only re-analyze blocks whose inputs changed. This transforms O(n) wasted work per round into O(changed) work.

2. Order Matters — A Lot
Reverse Postorder processes blocks in dependency order, so information flows "downhill" in one pass. For acyclic CFGs, RPO converges in a single pass.

3. Loops Are the Hard Part
Back edges create circular dependencies. Widening at loop headers forces convergence for infinite-height domains. Without it, iteration may never terminate.

4. Same Algorithm, Many Analyses
The worklist skeleton is domain-agnostic — plug in any transfer function and lattice. Reaching defs, live vars, taint, intervals — all use the same engine.

Analogy: Think of worklist iteration like a ripple in a pond. A change at one block creates a "ripple" that propagates to neighbors. RPO ensures ripples flow naturally downstream, and widening prevents infinite rippling in loops.

Worklist Algorithms Across the Bootcamp

You'll use worklist iteration throughout the PA Bootcamp. Here's where it appears.

Module 3

Dataflow Foundations

Module 4

Abstract Interpretation

Module 5

Security Analysis

Module 6

Tools Integration

Labs

Hands-On Implementation

Worklist iteration is the engine that powers every analysis you'll build.

Challenge C: Debug the Worklist

Each implementation has a bug. Identify what's wrong.

Bug 1: Never terminates

while worklist ≠ ∅:
  b = worklist.dequeue()
  new_out = transfer(b, IN[b])
  if new_out ≠ OUT[b]:
    OUT[b] = new_out
    for s in succs(b): worklist.add(s)
    worklist.add(b)  // re-add self

Bug 2: Misses some facts

IN[b] = ∅
for p in preds(b):
  IN[b] = OUT[p]  // overwrite
new_out = transfer(b, IN[b])

Bug 3: Wrong answer for loops

// Interval analysis, loop header
// No widening applied
IN[b] = ⊔ OUT[p] for p in preds(b)
OUT[b] = transfer(b, IN[b])

Bug 4: Backward analysis wrong

// Live variables (backward)
OUT[b] = ⊔ IN[s] for s in succs(b)
new_in = transfer(b, OUT[b])
if new_in ≠ IN[b]:
  for s in succs(b): worklist.add(s)

Quiz 1: Concept Check

Q1: When does a block get added to the worklist?

When it's first created When any predecessor's OUT changes Every iteration regardless Only when the lattice height increases

Q2: Why is RPO better than FIFO for forward analysis?

RPO uses less memory RPO processes blocks after their predecessors RPO skips unreachable blocks RPO avoids back edges entirely

Q3: What guarantees convergence for infinite-height lattices?

Using RPO ordering Monotone transfer functions alone Widening at loop headers Processing more blocks per iteration

Quiz 2: Predict the Next 3 Steps

Given this CFG and worklist state, predict what happens next.

B1: x = 5           gen={d1}  kill={d4}
B2: y = x + 1       gen={d2}  kill={}
B3: x = y           gen={d3}  kill={d1,d4}
B4: z = x           gen={d4}  kill={}

Current State:

        Worklist: [B3]

        OUT[B1]={d1}, OUT[B2]={d1,d2}, OUT[B3]={d2,d3}, OUT[B4]={d2,d3,d4}

        (Reaching definitions, forward, FIFO)

Your Predictions:

Step 1: Process B3 → OUT[B3] becomes

Which block(s) added to worklist?

Step 2: Process next → changed?

Quiz 3: Choose the Right Strategy

For each scenario, pick the best worklist optimization. Think about the CFG shape and analysis type.

Scenario 1

Forward constant propagation on a large function (200 blocks, 5 nested loops). Lattice: flat with ⊤/⊥ + constants.

Scenario 2

Backward live variable analysis on straight-line code (no loops, 50 blocks in sequence).

Scenario 3

Interval analysis (domain: [lo, hi]) on a program with a while(true) loop incrementing a counter.

Scenario 4

Taint analysis on a web app: 1000 functions, interprocedural, powerset lattice over source labels.