Security Analysis

Module 5 — Program Analysis Bootcamp

From numeric properties to security properties

SQL Injection

CWE-89

XSS

CWE-79

Command Injection

CWE-78

Path Traversal

CWE-22

Open Redirect

CWE-601

Learning Objectives

Implement a taint lattice satisfying ABSTRACT_DOMAIN
Define security configurations (sources, sinks, sanitizers)
Build forward taint propagation using abstract transfer functions
Track implicit information flows via program-counter taint
Detect OWASP vulnerability patterns (SQLi, XSS, cmd injection)
Evaluate precision and limitations of taint analysis

The Leap from Module 4

M4: What value?

Sign: {+, −, 0, ⊤, ⊥}
Interval: [lo, hi]
Detects: div-by-zero

→

M5: Where from?

Taint: {Tainted, Untainted, ⊤, ⊥}
Tracks: data provenance
Detects: injection attacks

Same infrastructure, different question! The ABSTRACT_DOMAIN signature, MakeEnv functor, eval_expr, and transfer_stmt all carry over unchanged.

Motivating Example: SQL Injection

Type a malicious input and watch the query break.

-- Server code:
input = get_param("name")  -- SOURCE
query = "SELECT * FROM users
         WHERE name = '" + input + "'"
exec_query(query)          -- SINK

Enter user input:

Generated SQL query:

Like an unlocked front door: The application blindly trusts user input and pastes it into a sensitive operation. Taint analysis detects this by tracking where data comes from.

Taint Analysis Pipeline

Three phases: Source introduces taint, Propagation spreads it, Sink checks for it.

Sources, Sinks & Sanitizers

Sources	`get_param`, `read_cookie`, `read_input`
Sinks	`exec_query`, `exec_cmd`, `send_response`
Sanitizers	`escape_sql`, `html_encode`, `shell_escape`

Sources create taint, sinks consume (and check) it, sanitizers remove it. If tainted data reaches a sink without sanitization → vulnerability!

The Taint Lattice

A flat 4-element lattice — same shape as Sign domain, different meaning.

Click a lattice node to learn about it.

Join Calculator

join =

Top

Full Join Table

join	Bot	Unt	Tai	Top
Bot	Bot	Unt	Tai	Top
Unt	Unt	Unt	Top	Top
Tai	Tai	Top	Tai	Top
Top	Top	Top	Top	Top

Key: join Tainted Untainted = Top — if data might be either, treat as potentially tainted (conservative/sound).

Security Configuration

Define what's dangerous — sources, sinks, and sanitizers are configurable, not hard-coded.

type source = {
  source_name : string;        (* "get_param" *)
  source_description : string;
}
type sink = {
  sink_name : string;          (* "exec_query" *)
  sink_param_index : int;      (* which arg to check *)
  sink_vuln_type : string;     (* "sql-injection" *)
}
type sanitizer = {
  sanitizer_name : string;     (* "escape_sql" *)
  sanitizer_cleans : string list;
}

How it works at runtime

When the analyzer encounters Call(name, args):

Source? → Return Tainted

Sanitizer? → Return Untainted

Sink? → Check arg taint → report if tainted

Unknown? → Return Top (conservative)

No AST changes needed! The config is matched against Call(name, args) nodes. Same analyzer, different configs → detect different vulnerability classes.

Like a virus scanner's signature database: The scan engine stays the same — you just update the definitions (config) to detect new threats.

Forward Taint Propagation Rules

Click any expression type to see how taint propagates.

Click a rule on the left to see an example.

Key insight: Taint propagates through binary operations — if either operand is tainted, the result is tainted. "safe" + tainted = tainted.

Taint Propagation: Worked Example

Watch taint flow through a 6-line program, step by step.

input  = get_param("q")       -- source
prefix = "SELECT * WHERE "    -- literal
query  = prefix + input       -- propagation
safe   = escape_sql(input)    -- sanitizer
safe_q = prefix + safe        -- clean + clean
exec_query(query)             -- SINK CHECK

Challenge A: Trace the Taint

For each line, predict the taint status of the assigned variable.

a = get_param("id")
b = 100
c = a + b
d = html_encode(c)
e = d + a
send_response(e)

Line 1: a =

Line 2: b =

Line 3: c =

Line 4: d =

Line 5: e =

Line 6: vuln?

Answer all questions and click "Check All"

Vulnerability Detection at Sinks

At each Call(name, args) that is a sink — check if the relevant argument is tainted.

let check_call env config func_name call_name args =
  match find_sink config call_name with
  | None   -> []   (* not a sink *)
  | Some sink ->
    let arg = List.nth args sink.sink_param_index in
    let taint = eval_expr env arg in
    if is_potentially_tainted taint then
      [{ vuln_type = sink.sink_vuln_type;
         location  = func_name;
         sink_name = call_name; ... }]
    else []

is_potentially_tainted returns true for both Tainted and Top — because Top means "might be tainted" and a sound analysis must warn.

Interactive Sink Checker

Sink function:

Argument taint status:

Select a sink and taint, then click "Check"

Sanitizers & Effectiveness

Sanitizers are vulnerability-type-specific — the wrong sanitizer doesn't help!

Match sanitizer to vulnerability

exec_query(input)

send_response(input)

exec_cmd(input)

redirect(input)

Match each sink to the correct sanitizer, then click "Check All"

Effectiveness Matrix

	SQLi	XSS	CmdInj	PathTrv	Redir
escape_sql	✓	✗	✗	✗	✗
html_encode	✗	✓	✗	✗	✗
shell_escape	✗	✗	✓	✗	✗
validate_path	✗	✗	✗	✓	✗
validate_url	✗	✗	✗	✗	✓

Explicit vs Implicit Information Flows

Two ways secret data can leak — through data or through control.

Explicit Flow (Data Dependency)

secret = get_param("password") -- Tainted
x = secret                      -- Tainted
y = x + 1                       -- Tainted

Taint follows the data — wherever the value goes, taint goes. Standard taint propagation catches this.

Implicit Flow (Control Dependency)

secret = get_param("pin")  -- Tainted
if secret == 1234:
    x = 1                  -- reveals secret!
else:
    x = 0                  -- reveals secret!

After this code, x reveals whether secret == 1234. The value of secret influences x through the branch, not through assignment. Standard taint propagation misses this!

Analogy: Explicit flow is like passing a note directly. Implicit flow is like signaling yes/no by turning a light on or off — the information travels through the choice, not the data.

PC-Taint: Handling Implicit Flows

Track whether we're inside a branch controlled by tainted data.

secret = get_param("pin")  -- pc: Unt
if secret == 1234:         -- cond: Tai
    x = 1                  -- pc: Tai!
else:
    x = 0                  -- pc: Tai!
y = x                      -- pc: Unt

OWASP Vulnerability Gallery

Click a vulnerability type to see the attack pattern, detection, and fix.

Click a vulnerability card on the left.

Putting It All Together

The complete security analysis pipeline — from AST to vulnerability report.

Same algorithm as M4! The analysis pipeline is identical — only the abstract domain changed from numeric (signs/intervals) to security (taint).

Challenge B: Find the Vulnerability

Analyze this code — identify the source, sink, vulnerability type, and fix.

user = get_param("username")
role = "guest"
greeting = "Hello, " + user
page = "<div>" + greeting + "</div>"
log_msg = "User: " + user
write_log(log_msg)
send_response(page)

Source function:

Dangerous sink (line #):

Vulnerability type:

Correct sanitizer:

Identify all four elements and click "Check All"

Limitations & False Positives

Taint analysis is sound (no missed bugs) but can be imprecise (false alarms).

Over-Approximation Sources

1. Join at merge points

if cond:
  x = get_param("q")  -- Tainted
else:
  x = "safe"           -- Untainted
-- After merge: join(T, U) = Top

Even though x might be safe, we treat it as potentially tainted.

2. Unknown functions → Top

Any function not in our config returns Top. Even perfectly safe helper functions get flagged.

3. No string content tracking

Can't tell that "SELECT " + escape_sql(x) is safe at the string level. Must rely on sanitizer config.

Other Limitations

4. No alias analysis

If x and y point to the same data, tainting x should taint y. Our analysis treats variables independently.

5. Intraprocedural only

Our analyzer works within a single function. Real tools like Infer and CodeQL do interprocedural analysis across function boundaries.

The tradeoff: Like a smoke detector — it catches all fires but also triggers for burnt toast. More false positives is annoying but safe. Missing a real fire is dangerous.

Real-World Security Analysis Tools

Production tools that use taint analysis ideas at scale.

Click a tool on the left to learn about it.

Key Takeaways

The essential ideas from Module 5 — Security Analysis.

1. Taint analysis reuses M4's framework
Same ABSTRACT_DOMAIN, MakeEnv, eval_expr, transfer_stmt. The domain changed from numeric to security — that's it.

2. The taint lattice is flat (4 elements)
Bot ⊑ {Tainted, Untainted} ⊑ Top. Same shape as sign domain. Finite, so no widening needed.

3. Three components: Source → Propagation → Sink
Sources introduce taint, propagation spreads it through operations, sinks check for it. Sanitizers break the chain.

4. Sanitizers are type-specific
escape_sql fixes SQLi but not XSS. html_encode fixes XSS but not SQLi. Wrong sanitizer = still vulnerable.

5. Implicit flows need PC-taint
Standard taint misses control-flow leaks. PC-taint tracks when we're inside a tainted branch, tainting all assignments within.

The Module Progression:
M3 (framework) + M4 (abstract domains) + M5 (security domain) = a real vulnerability detector. Next: M6 brings it all together with tools and integration.

What We Learned in Module 5

A complete summary of security analysis concepts.

Core Theory

✓ Taint lattice (4-element flat)

✓ Join/meet/leq operations

✓ Sources, sinks, sanitizers

✓ Forward taint propagation

✓ Transfer functions (same as M4)

✓ Explicit vs implicit flows

✓ PC-taint for control deps

Vulnerabilities

✓ SQL Injection (CWE-89)

✓ XSS (CWE-79)

✓ Command Injection (CWE-78)

✓ Path Traversal (CWE-22)

✓ Open Redirect (CWE-601)

✓ Sanitizer effectiveness matrix

✓ Vulnerability detection at sinks

Practical Skills

✓ Trace taint through code

✓ Identify source→sink paths

✓ Match sanitizers to vulns

✓ Detect implicit flow leaks

✓ Security config design

✓ Real tools (Semgrep, CodeQL)

✓ Limitations & false positives

Taint Analysis = Abstract Interpretation Applied to Security

Challenge C: Security Scenario Analysis

For each real-world scenario, identify the vulnerability and choose the correct defense.

Q1. A web app takes a search query from the URL and displays it on the results page: "Results for: " + query

Q2. A file download endpoint uses the filename from the request: open_file("/uploads/" + filename)

Q3. A login page redirects users to a "return URL" after login: redirect(get_param("next"))

Q4. An admin panel lets operators run diagnostics: exec_cmd("ping " + host)

Answer all scenarios and click "Check All"

Quiz 1: Concept Check

Test your understanding of taint analysis fundamentals.

Q1. What does join Tainted Untainted equal?

Tainted Untainted Top Bot

Q2. Why does the taint lattice NOT need widening?

It's too simple Finite height — no infinite chains Widening would be unsound Taint can't increase

Q3. What does is_potentially_tainted return true for?

Only Tainted Tainted and Top Everything except Bot Everything

Quiz 2: Trace the Taint

Trace taint through branching code with implicit flows.

token = read_cookie("auth")
role  = "guest"
if token == "admin_key":
    role = "admin"
msg = "Welcome, " + role
send_response(msg)

With pc_taint tracking:

role at line 5: msg at line 5: Vulnerability?

Fill in the answers and click "Check"

Quiz 3: Real-World Vulnerability Analysis

For each scenario, identify the full taint path and whether the code is safe.

Scenario 1:

name = get_param("name")
safe = html_encode(name)
page = "<h1>" + safe + "</h1>"
send_response(page)

Scenario 2:

id = get_param("id")
safe = html_encode(id)
query = "SELECT * WHERE id=" + safe
exec_query(query)

Scenario 3:

data = read_input()
result = mystery_func(data)
exec_query(result)

Answer all scenarios and click "Check All Answers"