🔬

Program Analysis for Beginners

Module 1: Foundations of Program Analysis

Instructor: Weihao  |  Office Hours: By appointment, HH227

Use arrow keys or buttons to navigate

Learning Objectives

By the end of this module, you will be able to:

Key Idea: Program analysis is the backbone of every modern development tool — from your IDE's autocomplete to CI/CD security gates. Understanding it unlocks how all those tools actually work under the hood.

What is Program Analysis?

Program Analysis is the process of automatically analyzing the behavior of computer programs.

  • Goal: Understand what a program does without (necessarily) executing it
  • Purpose: Find bugs, optimize performance, verify correctness
  • Scope: From simple syntax checking to complex security analysis
Analogy: Think of a building inspector examining blueprints (static) vs. stress-testing the actual structure (dynamic). Both find problems, but at different stages and costs.

Why Program Analysis? Real-World Failures

Click Step to walk through historic software disasters that could have been prevented.

Press Step to begin the timeline...

The cost of NOT analyzing: These incidents caused billions in damages and, in some cases, loss of life. Automated analysis catches the patterns that humans miss.

Real-World Impact

Program analysis powers tools you use every day — often without realizing it.

IDEs

IntelliSense, error highlighting, refactoring suggestions

Compilers

Dead code elimination, constant folding, optimization

Security

Vulnerability scanners, malware detection, taint tracking

DevOps

CI/CD quality gates, SonarQube, Semgrep, linters

Key Idea: Every time your IDE underlines a bug before you run the code — that's program analysis at work.

Two Main Approaches

Static Analysis

Examines code without execution

  • Analyzes source code, bytecode, or binary
  • Catches errors before runtime
  • No test inputs or execution environment needed
  • Can reason about all possible paths

Example: Checking for null pointer dereferences by tracing variable assignments through all branches.

Static Analysis: Null Pointer Detection

Step through how a static analyzer traces null-safety through code branches.

processUser.java
1 public void processUser(User user) {
2 if (user.getName() != null) {
3 System.out.println(user.getName());
4 }
5 // After the if-block...
6 int len = user.getName().length();
7 }
Analyzer will trace null-safety...

Dynamic Analysis: Watching Execution

Step through fibonacci(5) and watch the call stack explode — a profiler catches this at runtime.

fibonacci.ml
1 let rec fibonacci n =
2 if n <= 1 then n
3 else
4 fibonacci (n-1) + fibonacci (n-2)
Why dynamic analysis catches this: Static analysis sees recursion, but a profiler measures the exponential blowup — 15 calls for fib(5), but 2.7 billion for fib(50).
Dynamic profiler ready...

Challenge: Static or Dynamic?

For each scenario, decide whether it describes static or dynamic analysis. Select your answer, then click Check All.

1. A tool scans your source code and warns that a variable might be used before it's assigned a value.

2. While running your test suite, a tool detects that your program allocates memory but never frees it.

3. A CI/CD gate rejects your pull request because it detects a possible SQL injection in a query builder function — without running any tests.

4. A profiler measures that a specific function takes 3.2 seconds with a 10,000-element input list.

Static vs Dynamic: Head-to-Head

Compare how each approach handles the same program properties.

Analogy: Static analysis is like reviewing a map for wrong turns. Dynamic analysis is like driving the route and hitting actual potholes. The map catches everything theoretically; the drive catches what matters practically.

Three Core Objectives

Every program analysis technique serves one (or more) of these goals. Click each to explore.

Click a segment to explore

Each objective targets different kinds of program defects and uses different analysis techniques.

Soundness & Completeness

Two fundamental properties of any analysis. Drag the bug icons to see which category they fall into.

The Fundamental Trade-off

Move the slider to see how adjusting the analysis threshold affects false positives and false negatives.

Sound Complete
Key Insight: There's no "best" position — it depends on context. Safety-critical systems lean sound (better safe than sorry). Developer tools lean complete (avoid alert fatigue).

Challenge: Sound, Complete, or Neither?

For each analysis tool description, classify it. Then click Check All.

1. A type checker rejects programs that might have type errors, even if some rejected programs would actually run fine.

2. A test suite with 85% code coverage. Every failing test reveals a genuine bug, but some untested paths have hidden issues.

3. A heuristic linter that catches common patterns like "= instead of ==" but sometimes misses real bugs and occasionally flags correct code.

Analysis Scope Levels

Click each scope level to see what it analyzes and its trade-offs.

Click a scope level

Broader scope = more accurate but more expensive.

Intra vs Interprocedural: Side by Side

Step through to see how each scope level analyzes the same taint-tracking problem.

Intraprocedural (function-local)

query.ml — local view
1 let query user_input =
2 let safe = sanitize user_input in
3 Db.execute (Printf.sprintf
4 "SELECT * WHERE name='%s'" safe)
Intraprocedural view...

Interprocedural (cross-function)

query.ml — full view
1 let sanitize input =
2 String.concat "''" (String.split_on_char '\'' input)
3 let query user_input =
4 let safe = sanitize user_input in
5 Db.execute (Printf.sprintf
6 "SELECT * WHERE name='%s'" safe)
Interprocedural view...

Modern Analysis Ecosystem

Click each stage of the development pipeline to see which analysis tools apply.

Click a pipeline stage

Analysis tools integrate at every phase of the Software Development Life Cycle.

Key Insight: Earlier detection = cheaper fixes. A bug found in design costs 100x less to fix than one found in production.

Program Analysis vs Testing vs Debugging

Three complementary approaches. Hover over each to see how they relate.

Click a circle to compare

These three approaches cover different aspects of software quality. They complement each other — no single approach is sufficient alone.

Analogy: Analysis = Doctor reading your X-ray. Testing = Running on a treadmill while monitored. Debugging = Diagnosing why you collapsed during the run.

Challenge: Match Tools to Pipeline Stage

For each tool, select the SDLC stage where it primarily applies.

1. SonarQube — automated code quality scanner

2. TypeScript — static type checker

3. Sentry — runtime error monitoring

4. STRIDE — threat modeling framework

Key Takeaways

1. Program analysis automates code understanding and bug detection — scaling beyond what humans can review manually.
2. Static analysis examines code structure without running it; dynamic analysis observes actual execution. Use both.
3. Perfect analysis is impossible — Rice's Theorem guarantees you can't have both perfect soundness and completeness for non-trivial properties.
4. Trade-offs are real: Sound tools catch everything (but cry wolf). Complete tools are precise (but miss things). Choose based on context.
5. Scope matters: Intraprocedural is fast but limited. Interprocedural catches cross-function bugs. Whole-program is most precise but expensive.
6. Analysis at every stage — from IDE linting (design) to production monitoring (deploy). Earlier detection = cheaper fixes.

Next Session Preview

Module 2: Code Representation & ASTs

  • How programs are represented internally as Abstract Syntax Trees
  • Building and traversing ASTs in OCaml
  • Hands-on: Writing your first code transformer
Prep: Review basic tree data structures (nodes, children, traversal). If you know recursion, you're ready.

Why ASTs matter

Every tool we discussed today — linters, type checkers, compilers, security scanners — works by building and analyzing an AST. Module 2 shows you how to build one yourself.

Source code: x + 1
↓ parse
AST: BinOp(+)
├── Var("x")
└── Int(1)

Office Hours: By appointment, HH227
Resources: Course GitHub repository

Quiz 1: Core Concepts

Test your understanding. Select one answer per question, then click Check Answers.

Q1. Which is a key advantage of static analysis over dynamic analysis?

Q2. A sound analysis guarantees that:

Q3. Interprocedural analysis differs from intraprocedural by:

Quiz 2: Classify the Analysis

Read each code snippet and scenario. Identify the analysis type and what it catches.

// Tool output:
WARNING: Variable 'count' used
before assignment on line 12.
(No code was executed.)

Type?

// Tool output:
ALERT: User input flows to
SQL query without sanitization.
Source: req.body → Sink: db.query()

Type?

// Tool output (during test run):
LEAK: 2,048 bytes allocated at
malloc() line 45, never freed.
Detected by: Valgrind memcheck

Type?

// Tool output (profiler):
HOT: sort_list() called 10,000x
Total: 4.2s (78% of runtime)
Suggestion: O(n²) → O(n log n)

Type?

Quiz 3: Trade-off Reasoning

Given each scenario, decide which analysis property matters most and predict the outcome.

Scenario 1: You're building a pacemaker controller. A missed bug could kill a patient. Which property do you prioritize?

Scenario 2: Your team gets 500+ false alarm alerts per day. Developers have stopped reading them. What should change?

Scenario 3: A security scanner reports 0 vulnerabilities. Can you conclude the code is secure?