Abstract Domains

The Language of Approximation

Fundamentals Deep Dive • PA Bootcamp

What Is an Abstract Domain?

An abstract domain maps infinite concrete values to finite abstract representations — trading precision for computability.

Concrete Values
Abstraction Result
Click an abstraction function to see the mapping
Analogy: Abstract domains are like zoom levels on a map. A street-level view (concrete) shows every house; a city-level view (abstract) shows neighborhoods. You lose detail but gain the ability to reason about the whole city.

The ABSTRACT_DOMAIN Signature

Every abstract domain in the bootcamp implements this OCaml module type — 7 operations that define the domain's behavior.

module type ABSTRACT_DOMAIN = sig
type t
val bot : t
val top : t
val join : t -> t -> t
val meet : t -> t -> t
val leq : t -> t -> bool
val widen : t -> t -> t
val eval : expr -> env -> t
end
Click any operation to learn about it →
Click a line in the code to see its explanation

Sign Domain Deep Dive

The simplest useful domain — tracks whether a value is negative, zero, or positive.

Try it: enter a value
Lattice Properties
Elements: {⊥, Neg, Zero, Pos, ⊤}
Height: 3 (⊥ → element → ⊤)
Width: 3 (Neg, Zero, Pos)
Finite? Yes — 5 elements
Needs widening? No — join = widen
Key insight: With only 5 elements, sign analysis is extremely fast but imprecise. It cannot tell the difference between x = 1 and x = 1000000 — both are just Pos.
Precision limit: x * x is always ≥ 0, but sign analysis computes Pos * Pos = Pos, Neg * Neg = Pos, Pos * Neg = Neg — so ⊤ * ⊤ = ⊤. It can't prove non-negativity!

Sign Domain — Transfer Functions

How arithmetic works in the sign world. Step through to see each operation.

Operation Log
Multiplication Table
×NegZeroPos
Neg????
Zero????
Pos????
????

Constant Propagation Domain

Tracks whether a variable always holds the same constant value — if so, the compiler can fold it.

Join two values:
Lattice Properties
Elements: {⊥} ∪ {Const(n) | n ∈ ℤ} ∪ {⊤}
Height: 3 (flat lattice)
Width: ∞ (one element per integer)
Finite? Infinite elements, but height 3 → ACC holds
Needs widening? No — height 3 means join = widen
Key insight: Flat lattice = all-or-nothing. Either both branches agree on the exact same constant, or we know nothing (⊤). No middle ground.
Use Case: Constant Folding
x = 3; y = x + 2; → analysis determines y = Const(5) → compiler replaces with y = 5

Interval Domain — Representing Ranges

Tracks a range [lo, hi] of possible values — more precise than sign, but infinite height.

A: B:
Enter intervals and click Compute
Lattice Properties
Height: ∞ (infinite chains: [0,0]⊑[0,1]⊑[0,2]⊑...)
Needs widening? YES — essential for loops
Precision: High — tracks actual bounds
Trade-off: Intervals are more precise than signs, but the infinite height means we must use widening at loop headers or analysis won't terminate.

Interval Arithmetic

How to add, subtract, multiply, and divide intervals — the transfer functions for the interval domain.

Rules
[a,b] + [c,d] = [a+c, b+d]
[a,b] − [c,d] = [a−d, b−c]
[a,b] × [c,d] = [min(ac,ad,bc,bd),
max(ac,ad,bc,bd)]
[a,b] ÷ [c,d] = [a,b] × [1/d, 1/c]
// if 0 ∈ [c,d] → ⊤ (div by zero!)
Step Log

Challenge A: Domain Precision Quiz

Given the concrete set {-4, 0, 3, 7}, predict the abstraction in each domain.

Sign Domain
α({-4, 0, 3, 7}) = ?
Constant Propagation
α({-4, 0, 3, 7}) = ?
Interval Domain
α({-4, 0, 3, 7}) = ?
Trap Question: How many concrete values does [-4, 7] include?
Values NOT in the original set but inside the interval?

Parity Domain (Even / Odd)

A simple but useful domain — tracks whether a value is even or odd. Great for array indexing and alignment checks.

Evaluate:
Enter values and click Evaluate
Parity Arithmetic Rules
Addition
E + E = E
E + O = O
O + O = E
Subtraction
E − E = E
E − O = O
O − O = E
Multiply
E × E = E
E × O = E
O × O = O
Key insight: 2 * x is always Even regardless of x — parity can prove this but sign domain cannot. Each domain has unique strengths!

Taint Domain for Security

Tracks whether data originates from untrusted sources — essential for finding injection vulnerabilities.

Interactive: Taint Propagation
user_input =
greeting = "Hello " + user_input
sanitized =
html_output(sanitized)
Taint Rules
Tainted + anything = Tainted (taint spreads)
Clean + Clean = Clean
sanitize(Tainted) = Clean
Sink receives TaintedVULNERABILITY!

Domain Precision Comparison

The same program analyzed with three different domains — watch how precision differs.

x = 5;
y = x * 2;
z = y - 10;
w = 100 / z; // safe?

Reduced Product — Combining Domains

Run two domains simultaneously and let them share information. The combination catches things neither alone can.

Click a domain view to see what it tells us about x after x = 2*y + 1 where y ∈ Pos
Reduced product = run both domains and use a reduction operator to tighten each domain using the other's information. This is strictly more precise than either alone.

Widening & Narrowing

Widening forces convergence by overshooting. Narrowing recovers precision by tightening back.

Program: x = 0; while(*) x = x + 1;
Without widening: [0,0] → [0,1] → [0,2] → ... → never terminates!
Two phases:
1. Ascending + widening: overshoot to [0, +∞]
2. Descending + narrowing: tighten back toward true range

Designing a New Domain — Step by Step

Follow the 5-step recipe to build a custom abstract domain for any property you want to track.

Example: Nullness Domain
Goal: track whether a pointer can be null, non-null, or either.
Step 1: Define the elements
Step 2: Define the ordering (⊑)
Step 3: Define join and meet
Step 4: Define α (abstraction) and γ (concretization)
Step 5: Write the transfer functions
Click a step to see the details for the Nullness domain

Challenge B: Fix the Domain Bug

Each domain implementation has a bug. Identify what's wrong.

Bug 1: Non-monotone join
let join a b = match a, b with
| Bot, x | x, Bot -> x
| Pos, Neg -> Zero (* compromise *)
| _ -> Top
Bug 2: Unsound interval transfer
let div [a,b] [c,d] =
[a/d, b/c] (* just divide bounds *)
Bug 3: Wrong widening
let widen [a,b] [c,d] =
[min(a,c), max(b,d)]
Bug 4: Taint domain unsound
let eval_binop a b =
match a, b with
| Tainted, Clean -> Clean (* sanitized *)
| Clean, Tainted -> Tainted
| _ -> a

Why Relational? The Limits of Per-Variable Domains

All domains so far track one variable at a time. What happens when the property you need involves two variables?

i = 0
while i < n:
a[i] = 0 # safe?
i = i + 1
We need to prove i < n at line 3 to guarantee no out-of-bounds access. Click a view to see what intervals can (and can't) tell us.
The Problem: Intervals say i ∈ [0,∞) and n ∈ [0,∞). Both ranges overlap — intervals cannot prove i < n because they track each variable independently. The relationship is invisible.
Key Insight: To prove i < n, we need a domain that tracks relationships between variables — a relational domain. That's what octagons and polyhedra do.

The Octagon Domain — Tracking ±x ± y ≤ c

Octagons track constraints of the form ±x ± y ≤ c for every pair of variables. Step through a loop analysis.

Octagon constraints will appear here...
Analogy: Intervals give each variable its own ruler. Octagons add a protractor between every pair — measuring the angle (difference) between them.

Polyhedra & The Precision Ladder

Polyhedra track arbitrary linear constraints: a₁x₁ + a₂x₂ + ... + aₙxₙ ≤ c. Toggle to compare all three.

Program: y = 2*x + 1 where x ∈ [0,5].
True region: line segment from (0,1) to (5,11).

Click a domain to see how it approximates this.
Intervals
Per-variable only
O(n) space
Fast
Octagons
±x ± y ≤ c
O(n²) space
Medium
Polyhedra
a₁x+a₂y ≤ c
O(2ⁿ) space
Expensive
The Precision Ladder: Intervals ⊂ Octagons ⊂ Polyhedra. Each step up captures more relationships but costs more. Pick the cheapest that answers your question.

The Domain Selection Guide

Choosing the right domain is a precision/cost trade-off. Follow this decision tree.

Click a node in the decision tree to see the recommendation
Rule of thumb: Start with the simplest domain that could answer your question. Only add complexity if you need more precision.

Key Takeaways

1. Approximation Is Inevitable
Perfect analysis is undecidable (Rice's theorem). Abstract domains make the controlled compromise — trading precision we don't need for computability we must have.
2. Domain = Precision/Cost Trade-off
Sign (5 elements, instant) → Constant (flat, fast) → Interval (infinite height, needs widening) → Octagon (quadratic) → Polyhedra (exponential). Pick the lightest domain that answers your question.
3. Composition Multiplies Power
The reduced product of two cheap domains can be more precise than either expensive alternative. Sign × Parity catches things neither alone can. Always consider domain combinations.
4. Soundness Is Non-Negotiable
Every domain must over-approximate: if the concrete answer is "yes," the abstract answer must include "yes." False positives are OK; false negatives are bugs in the analyzer.
Analogy: Abstract domains are like measuring instruments. A ruler (intervals) measures length. A protractor (signs) measures direction. A surveyor's total station (polyhedra) measures both — but costs 100× more. Choose the right tool for the job.

Abstract Domains Across the Bootcamp

Every module uses abstract domains. Here's the map of where each domain appears.

Module 3
Dataflow Foundations
Module 4
Abstract Interpretation
Module 5
Security Analysis
Module 6
Tools Integration
Labs 4 & 6
Hands-On Implementation
The domain is the lens through which your analyzer sees the program.

Challenge C: Choose the Right Domain

For each scenario, pick the most appropriate abstract domain. Consider precision needs and cost.

Scenario 1
A compiler wants to replace x * 2 with x << 1 when it can prove x is always the same constant.
Scenario 2
A security tool must detect if request.params can reach db.query() without passing through sanitize().
Scenario 3
Prove that array index i is always between 0 and arr.length - 1 inside a loop.
Scenario 4 (Trap!)
Prove that i < n holds at a specific point, where both i and n are variables modified in a loop.

Quiz 1: Domain Properties

Q1: What is the height of the constant propagation lattice?
Q2: What is join(Const(3), Const(7)) in the constant domain?
Q3: Which domain MUST use widening?

Quiz 2: Trace Through Domains

Given this program, predict the abstract state at each point in the sign domain.

a = -3;
b = 7;
c = a + b;
d = a * a;
e = c * d;
c = a + b → sign of c?
d = a * a → sign of d?
e = c * d → sign of e?
Answer the questions and click Check, or click Show Trace to see the full walkthrough.

Quiz 3: Domain Design Decisions

For each situation, choose the correct design decision for your abstract domain.

Q1: You're designing a domain to track if a file handle is open or closed. How many elements?
Q2: Your interval analysis widens [0,5] with new value [0,8]. What should widen return?
Q3: What makes an analysis "sound"?
Q4: When should you initialize dataflow facts to ⊤ vs ⊥?