🔬

Abstract Interpretation

Module 4 — Program Analysis Bootcamp

±
Sign Domain
42
Constant Domain
[a,b]
Interval Domain

Instructor: Weihao  |  Office Hours: By appointment, HH227

The Leap from Module 3

Module 3 tracked sets of names:

Reaching Defs: {d1, d3, d5}
Live Variables: {x, y, temp}

Module 4 tracks abstract values:

Sign: x → Pos, y → Neg
Constant: x → Const(42)
Interval: x → [0, 100]

Learning Objectives

  • Implement abstract domains (sign, constant, interval)
  • Formalize Galois connections for soundness
  • Apply widening to guarantee termination
  • Build an abstract interpreter for div-by-zero
  • Compare domains: precision vs cost
Same framework, richer information:
Analysis = Lattice + Transfer Functions + Fixpoint
We keep the solver, but change what the lattice tracks.

Can This Crash?

Step through with sign analysis to detect a division-by-zero before running:

let analyze x y =
let a = x * x in
let b = a + 1 in
let c = y - y in
let result = b / c in
result
Module 3 says: "d4 reaches line 5" — true, but doesn't answer the safety question.
Module 4 says: "c = Zero, so b/c is division by zero!" — bug found at compile time.

Concrete vs Abstract Semantics

We trade precision for decidability. Click each concrete value to see its abstraction:

Click a concrete value on the left to see how it maps to the abstract world.

The deal:
  • Concrete: exact but undecidable (halting problem)
  • Abstract: approximate but always terminates
Sound = if the analysis says "safe", the program truly is safe. We may get false alarms, but we never miss real bugs.

Galois Connections

The formal bridge between concrete and abstract. Click a set to see α (abstraction) and γ (concretization):

Click a concrete set or abstract value to see the α/γ mapping.

Formal Definition

α(c) ≤ a ⇔ c ≤ γ(a)
α: concrete set → best abstract value
γ: abstract value → concrete set
Analogy: α is like rounding up (you may lose precision). γ is like asking "what numbers could this represent?" The pair guarantees nothing slips through.

Soundness via Over-Approximation

Abstract interpretation is sound because the abstract result always contains the concrete result:

If concrete computation produces v,
then v ∈ γ(abstract_result).
No false negatives:
We never miss a real bug.
False positives possible:
We may warn about safe code.
Analogy: A metal detector at the airport. It might beep for your belt buckle (false positive), but it will never miss a real weapon (no false negatives). That's soundness.

The ABSTRACT_DOMAIN Signature

Module 4 extends Module 3's lattice with widening and ordering:

Module 3: LATTICE

module type LATTICE = sig
type t
val bottom : t
val top : t
val join : t → t → t
val meet : t → t → t
val equal : t → t → bool
end

Module 4: ABSTRACT_DOMAIN

module type ABSTRACT_DOMAIN = sig
type t
val bottom : t
val top : t
val join : t → t → t
val meet : t → t → t
val equal : t → t → bool
val leq : t → t → bool
val widen : t → t → t
val to_string : t → string
end
leq tests the partial order: does a ≤ b? Needed for fixpoint checking.
widen accelerates convergence for infinite-height lattices (intervals). Without it, loops may never terminate.
The abstract interpreter is a functor: same analysis code, parameterized by the domain. Swap SignDomain for IntervalDomain and get a different analysis for free.

Sign Domain: Interactive Calculator

The sign lattice has 5 elements. Pick two signs and an operation to see the result:

Key insight: Neg × Neg = Pos (negative times negative is positive). But Pos + Neg = Top (could be anything — we don't know magnitudes).

Challenge A: Abstract the Values

Given concrete values, pick the most precise abstraction in each domain.

Concrete → Abstract

Q1. Concrete set: {-3, -1, -7}

Q2. Concrete set: {5}

Q3. Concrete set: {-2, 0, 4}

Results

Answer all questions and click "Check All"

Key Rule: The abstraction α(S) must be the smallest (most precise) abstract value that contains all concrete values in S.

Sign Domain: Transfer Functions

How do abstract operations map signs through assignments and arithmetic?

a = 5;
b = -3;
c = a + b; // (+) + (−) = ⊤
d = a * b; // (+) × (−) = (−)
e = b * b; // (−) × (−) = (+)
f = c * d; // ⊤ × (−) = ⊤
Precision loss: (+) + (−) = ⊤ because the result could be positive, negative, or zero. Sign analysis can't determine which without concrete values.

Constant Propagation Domain

Track exact values when possible — lose precision only at merges with conflicting constants.

x = 10;
y = 20;
if (cond) {
x = 10; // same value!
} else {
x = 30; // different!
}
z = x + y; // x=⊤, y=20
Key insight: At merge points, if both branches assign the same constant → keep it. If they differ → ⊤. This is the join operation.

Interval Domain & Operations

Track bounds [lo, hi] — the most practical domain for real-world analyzers.

Interval Calculator


[ , ]


[ , ]
Click "Compute" to see the result
Interval rules:
[a,b] + [c,d] = [a+c, b+d]
[a,b] − [c,d] = [a−d, b−c]
[a,b] × [c,d] = [min(ac,ad,bc,bd), max(ac,ad,bc,bd)]
[a,b] ⊔ [c,d] = [min(a,c), max(b,d)]

Widening: Forcing Termination

Without widening, loops create infinitely ascending chains. Widening jumps to ∞ to guarantee convergence.

x = 0;
while (x < 100) {
x = x + 1;
}
Click "Step" to iterate the fixpoint. Toggle to compare with/without widening.

Application: Division-by-Zero Detection

The whole point — use abstract domains to catch real bugs at compile time.

Check: Does the interval contain 0?


[ , ]
Enter an interval for the divisor and click "Analyze"

Quick Examples:

Domain Comparison & Hierarchy

Each domain makes a different precision vs cost tradeoff. Click a domain to explore.

Click a domain on the left to see its details.

For x = -3, each domain says:

Sign:(just the sign)
Constant:-3(exact value)
Interval:[-3,-3](tight bounds)
Octagon:-3≤x≤-3, ±x±y≤c(relational)
Polyhedra:ax+by+cz≤d(full linear)

Challenge B: Apply Widening

Given a loop and interval analysis, predict what happens with and without widening.

y = 1;
while (y < 50) {
y = y * 2;
}

Q1. Without widening, what is y's interval after iteration 3?

Q2. With widening at iteration 2, what does y become?

Q3. Is [1,+∞] a sound over-approximation for y after the loop?

Answer all questions and click "Check All"

Abstract Interpretation in Practice

Real tools that use abstract interpretation to find real bugs in real software.

Click a tool on the left to learn about it.

Common theme: All these tools trade precision for soundness — they may report false positives but never miss a real bug (within their scope).

Key Takeaways

The essential ideas from Module 4 — Abstract Interpretation.

1. Abstraction trades precision for decidability
We can't analyze all possible concrete values (undecidable), so we group them into abstract values. The analysis always terminates and covers all cases.
2. Galois connections formalize the abstraction
α (abstraction) maps concrete → abstract. γ (concretization) maps abstract → concrete. Soundness: concrete ⊆ γ(α(concrete)).
3. Different domains = different tradeoffs
Sign (cheap, imprecise) → Constant → Interval → Octagon → Polyhedra (expensive, precise). Pick the domain that matches your analysis goal.
4. Widening ensures termination
Loops can create infinite ascending chains in the abstract domain. Widening jumps to a safe over-approximation (often ∞) to force convergence. Sound but less precise.
5. Sound ≠ Precise
A sound analysis never misses a real bug — but may report false positives. Like a metal detector at the airport: it catches all weapons but also your belt buckle.
From M3 to M4:
M3 (Static Analysis) used sets of definitions/variables. M4 replaces those sets with abstract values from a domain — same framework (lattice + transfer + fixpoint), richer vocabulary.

Looking Ahead: Module 5

From detecting bugs to detecting security vulnerabilities.

What's Coming in Security Analysis

1
Taint Analysis
Track untrusted data from sources (user input) to sinks (SQL queries, system calls)
2
Information Flow
Ensure secrets don't leak through public channels — noninterference property
3
OWASP & Real Vulns
SQL injection, XSS, command injection — analyzed as taint problems
The connection: M3 gave us the framework (dataflow). M4 gave us abstract domains. M5 combines both: taint analysis IS a dataflow problem with abstract domain {tainted, untainted}. Everything builds on what you've learned!

Module Progression

M1: Foundations (ASTs, CFGs) ────
M2: AST Construction ──────────
M3: Static Analysis Framework ──
M4: Abstract Interpretation ← YOU ARE HERE
M5: Security Analysis ──────── next
M6: Tools & Integration ──────

What We Learned in Module 4

A complete summary of abstract interpretation concepts.

Core Theory

✓ Concrete vs Abstract worlds
✓ Galois connections (α, γ)
✓ Soundness via over-approximation
✓ ABSTRACT_DOMAIN signature
✓ Transfer functions on abstract values
✓ Widening for termination

Domains Covered

✓ Sign domain (+, −, 0, ⊤, ⊥)
✓ Constant propagation (c, ⊤, ⊥)
✓ Interval domain ([lo, hi])
✓ Octagon domain (±x±y ≤ c)
✓ Polyhedra domain (linear)
✓ Precision vs cost hierarchy

Practical Skills

✓ Division-by-zero detection
✓ Sign arithmetic tables
✓ Interval arithmetic rules
✓ Applying widening to loops
✓ Choosing the right domain
✓ Real-world tools (Astrée, Infer)
Abstract Interpretation = Sound Bug Detection at Scale

Challenge C: Pick the Right Domain

For each scenario, choose the most appropriate abstract domain.

Q1. You need to verify array index i is always in bounds [0, n-1].

Q2. You need to prove x ≤ y + 5 is always true (a relational property).

Q3. You're writing a quick checker: is the result of a * b ever negative?

Q4. You want to constant-fold x = 3 + 4 at compile time.

Answer all questions and click "Check All"

Quiz 1: Concept Check

Test your understanding of abstract interpretation fundamentals.

Q1. What does α (alpha) do in a Galois connection?

Q2. What does widening guarantee?

Q3. In sign domain, (+) + (−) = ?

Quiz 2: Trace the Analysis

Given this code, trace the interval analysis at each line.

x = 10;
y = -5;
z = x + y; // [?, ?]
w = x * y; // [?, ?]
r = z / w; // safe?

Fill in the intervals:

z = [, ] w = [, ] div-by-zero?

Fill in intervals and click "Check"

Quiz 3: Choose the Analysis

For each bug-finding goal, pick the analysis technique and domain.

Scenario 1: You're building a tool to detect integer overflow in a loop counter that increments from 0.

Scenario 2: You want to replace x = 7 * 3 with x = 21 at compile time.

Scenario 3: A flight control system must guarantee zero runtime errors. Cost is not a concern.

Answer all scenarios and click "Check All Answers"