Security Analysis

Module 5 — Program Analysis Bootcamp

From numeric properties to security properties

SQL Injection
CWE-89
XSS
CWE-79
Command Injection
CWE-78
Path Traversal
CWE-22
Open Redirect
CWE-601

Learning Objectives

  1. Implement a taint lattice satisfying ABSTRACT_DOMAIN
  2. Define security configurations (sources, sinks, sanitizers)
  3. Build forward taint propagation using abstract transfer functions
  4. Track implicit information flows via program-counter taint
  5. Detect OWASP vulnerability patterns (SQLi, XSS, cmd injection)
  6. Evaluate precision and limitations of taint analysis

The Leap from Module 4

M4: What value?
Sign: {+, −, 0, ⊤, ⊥}
Interval: [lo, hi]
Detects: div-by-zero
M5: Where from?
Taint: {Tainted, Untainted, ⊤, ⊥}
Tracks: data provenance
Detects: injection attacks
Same infrastructure, different question! The ABSTRACT_DOMAIN signature, MakeEnv functor, eval_expr, and transfer_stmt all carry over unchanged.

Motivating Example: SQL Injection

Type a malicious input and watch the query break.

-- Server code:
input = get_param("name") -- SOURCE
query = "SELECT * FROM users
WHERE name = '" + input + "'"
exec_query(query) -- SINK
Generated SQL query:
Like an unlocked front door: The application blindly trusts user input and pastes it into a sensitive operation. Taint analysis detects this by tracking where data comes from.

Taint Analysis Pipeline

Three phases: Source introduces taint, Propagation spreads it, Sink checks for it.

Sources, Sinks & Sanitizers

Sources get_param, read_cookie, read_input
Sinks exec_query, exec_cmd, send_response
Sanitizers escape_sql, html_encode, shell_escape
Sources create taint, sinks consume (and check) it, sanitizers remove it. If tainted data reaches a sink without sanitization → vulnerability!

The Taint Lattice

A flat 4-element lattice — same shape as Sign domain, different meaning.

Click a lattice node to learn about it.

Join Calculator

join =
Top

Full Join Table

joinBotUntTaiTop
BotBotUntTaiTop
UntUntUntTopTop
TaiTaiTopTaiTop
TopTopTopTopTop
Key: join Tainted Untainted = Top — if data might be either, treat as potentially tainted (conservative/sound).

Security Configuration

Define what's dangerous — sources, sinks, and sanitizers are configurable, not hard-coded.

type source = {
source_name : string; (* "get_param" *)
source_description : string;
}
type sink = {
sink_name : string; (* "exec_query" *)
sink_param_index : int; (* which arg to check *)
sink_vuln_type : string; (* "sql-injection" *)
}
type sanitizer = {
sanitizer_name : string; (* "escape_sql" *)
sanitizer_cleans : string list;
}

How it works at runtime

When the analyzer encounters Call(name, args):

Source? → Return Tainted
Sanitizer? → Return Untainted
Sink? → Check arg taint → report if tainted
Unknown? → Return Top (conservative)
No AST changes needed! The config is matched against Call(name, args) nodes. Same analyzer, different configs → detect different vulnerability classes.
Like a virus scanner's signature database: The scan engine stays the same — you just update the definitions (config) to detect new threats.

Forward Taint Propagation Rules

Click any expression type to see how taint propagates.

Click a rule on the left to see an example.

Key insight: Taint propagates through binary operations — if either operand is tainted, the result is tainted. "safe" + tainted = tainted.

Taint Propagation: Worked Example

Watch taint flow through a 6-line program, step by step.

input = get_param("q") -- source
prefix = "SELECT * WHERE " -- literal
query = prefix + input -- propagation
safe = escape_sql(input) -- sanitizer
safe_q = prefix + safe -- clean + clean
exec_query(query) -- SINK CHECK

Challenge A: Trace the Taint

For each line, predict the taint status of the assigned variable.

1: a = get_param("id")
2: b = 100
3: c = a + b
4: d = html_encode(c)
5: e = d + a
6: send_response(e)
Line 1: a =
Line 2: b =
Line 3: c =
Line 4: d =
Line 5: e =
Line 6: vuln?

Answer all questions and click "Check All"

Vulnerability Detection at Sinks

At each Call(name, args) that is a sink — check if the relevant argument is tainted.

let check_call env config func_name call_name args =
match find_sink config call_name with
| None -> [] (* not a sink *)
| Some sink ->
let arg = List.nth args sink.sink_param_index in
let taint = eval_expr env arg in
if is_potentially_tainted taint then
[{ vuln_type = sink.sink_vuln_type;
location = func_name;
sink_name = call_name; ... }]
else []
is_potentially_tainted returns true for both Tainted and Top — because Top means "might be tainted" and a sound analysis must warn.

Interactive Sink Checker

Select a sink and taint, then click "Check"

Sanitizers & Effectiveness

Sanitizers are vulnerability-type-specific — the wrong sanitizer doesn't help!

Match sanitizer to vulnerability

exec_query(input)
send_response(input)
exec_cmd(input)
redirect(input)

Match each sink to the correct sanitizer, then click "Check All"

Effectiveness Matrix

SQLi XSS CmdInj PathTrv Redir
escape_sql
html_encode
shell_escape
validate_path
validate_url

Explicit vs Implicit Information Flows

Two ways secret data can leak — through data or through control.

Explicit Flow (Data Dependency)

secret = get_param("password") -- Tainted
x = secret -- Tainted
y = x + 1 -- Tainted
Taint follows the data — wherever the value goes, taint goes. Standard taint propagation catches this.

Implicit Flow (Control Dependency)

secret = get_param("pin") -- Tainted
if secret == 1234:
x = 1 -- reveals secret!
else:
x = 0 -- reveals secret!
After this code, x reveals whether secret == 1234. The value of secret influences x through the branch, not through assignment. Standard taint propagation misses this!
Analogy: Explicit flow is like passing a note directly. Implicit flow is like signaling yes/no by turning a light on or off — the information travels through the choice, not the data.

PC-Taint: Handling Implicit Flows

Track whether we're inside a branch controlled by tainted data.

secret = get_param("pin") -- pc: Unt
if secret == 1234: -- cond: Tai
x = 1 -- pc: Tai!
else:
x = 0 -- pc: Tai!
y = x -- pc: Unt

OWASP Vulnerability Gallery

Click a vulnerability type to see the attack pattern, detection, and fix.

Click a vulnerability card on the left.

"', fix:'safe = html_encode(input)\nhtml = "

" + safe + "

"\nsend_response(html)'}, {name:'Command Injection', cwe:'CWE-78', color:'#6366f1', severity:'Critical', source:'get_param("file")', sink:'exec_cmd(cmd)', sanitizer:'shell_escape', code:'filename = get_param("file")\ncmd = "cat " + filename\nexec_cmd(cmd)', attack:'filename = "; rm -rf /"', fix:'safe = shell_escape(filename)\ncmd = "cat " + safe\nexec_cmd(cmd)'}, {name:'Path Traversal', cwe:'CWE-22', color:'#22c55e', severity:'High', source:'get_param("page")', sink:'open_file(path)', sanitizer:'validate_path', code:'path = get_param("page")\nopen_file(path)', attack:'path = "../../etc/passwd"', fix:'safe = validate_path(path)\nopen_file(safe)'}, {name:'Open Redirect', cwe:'CWE-601', color:'#f472b6', severity:'Medium', source:'get_param("next")', sink:'redirect(url)', sanitizer:'validate_url', code:'url = get_param("next")\nredirect(url)', attack:'url = "https://evil.com/phishing"', fix:'safe = validate_url(url)\nredirect(safe)'} ]; const container = document.getElementById('s13cards'); vulns.forEach((v,i)=>{ const card = document.createElement('div'); card.style.cssText = `background:rgba(0,0,0,0.2);border:1px solid #334155;border-radius:8px;padding:0.5rem 0.8rem;cursor:pointer;transition:all 0.2s;display:flex;justify-content:space-between;align-items:center;`; card.innerHTML = `
${v.name} ${v.cwe}
${v.severity}`; card.addEventListener('click',()=>{ container.querySelectorAll('div').forEach(d=>d.style.borderColor='#334155'); card.style.borderColor = v.color; document.getElementById('s13detail').innerHTML = `

${v.name} (${v.cwe})

Source:${v.source} Sink:${v.sink} Sanitizer:${v.sanitizer}
Vulnerable code:
${v.code.split('\n').map(l=>'
'+l+'
').join('')}
Attack payload:
${v.attack}
Fix:
${v.fix.split('\n').map(l=>'
'+l+'
').join('')}
`; }); container.appendChild(card); }); })();

Putting It All Together

The complete security analysis pipeline — from AST to vulnerability report.

Same algorithm as M4! The analysis pipeline is identical — only the abstract domain changed from numeric (signs/intervals) to security (taint).

Challenge B: Find the Vulnerability

Analyze this code — identify the source, sink, vulnerability type, and fix.

1: user = get_param("username")
2: role = "guest"
3: greeting = "Hello, " + user
4: page = "<div>" + greeting + "</div>"
5: log_msg = "User: " + user
6: write_log(log_msg)
7: send_response(page)

Identify all four elements and click "Check All"

Limitations & False Positives

Taint analysis is sound (no missed bugs) but can be imprecise (false alarms).

Over-Approximation Sources

1. Join at merge points
if cond:
x = get_param("q") -- Tainted
else:
x = "safe" -- Untainted
-- After merge: join(T, U) = Top
Even though x might be safe, we treat it as potentially tainted.
2. Unknown functions → Top
Any function not in our config returns Top. Even perfectly safe helper functions get flagged.
3. No string content tracking
Can't tell that "SELECT " + escape_sql(x) is safe at the string level. Must rely on sanitizer config.

Other Limitations

4. No alias analysis
If x and y point to the same data, tainting x should taint y. Our analysis treats variables independently.
5. Intraprocedural only
Our analyzer works within a single function. Real tools like Infer and CodeQL do interprocedural analysis across function boundaries.
The tradeoff: Like a smoke detector — it catches all fires but also triggers for burnt toast. More false positives is annoying but safe. Missing a real fire is dangerous.

Real-World Security Analysis Tools

Production tools that use taint analysis ideas at scale.

Click a tool on the left to learn about it.

Key Takeaways

The essential ideas from Module 5 — Security Analysis.

1. Taint analysis reuses M4's framework
Same ABSTRACT_DOMAIN, MakeEnv, eval_expr, transfer_stmt. The domain changed from numeric to security — that's it.
2. The taint lattice is flat (4 elements)
Bot ⊑ {Tainted, Untainted} ⊑ Top. Same shape as sign domain. Finite, so no widening needed.
3. Three components: Source → Propagation → Sink
Sources introduce taint, propagation spreads it through operations, sinks check for it. Sanitizers break the chain.
4. Sanitizers are type-specific
escape_sql fixes SQLi but not XSS. html_encode fixes XSS but not SQLi. Wrong sanitizer = still vulnerable.
5. Implicit flows need PC-taint
Standard taint misses control-flow leaks. PC-taint tracks when we're inside a tainted branch, tainting all assignments within.
The Module Progression:
M3 (framework) + M4 (abstract domains) + M5 (security domain) = a real vulnerability detector. Next: M6 brings it all together with tools and integration.

What We Learned in Module 5

A complete summary of security analysis concepts.

Core Theory

✓ Taint lattice (4-element flat)
✓ Join/meet/leq operations
✓ Sources, sinks, sanitizers
✓ Forward taint propagation
✓ Transfer functions (same as M4)
✓ Explicit vs implicit flows
✓ PC-taint for control deps

Vulnerabilities

✓ SQL Injection (CWE-89)
✓ XSS (CWE-79)
✓ Command Injection (CWE-78)
✓ Path Traversal (CWE-22)
✓ Open Redirect (CWE-601)
✓ Sanitizer effectiveness matrix
✓ Vulnerability detection at sinks

Practical Skills

✓ Trace taint through code
✓ Identify source→sink paths
✓ Match sanitizers to vulns
✓ Detect implicit flow leaks
✓ Security config design
✓ Real tools (Semgrep, CodeQL)
✓ Limitations & false positives
Taint Analysis = Abstract Interpretation Applied to Security

Challenge C: Security Scenario Analysis

For each real-world scenario, identify the vulnerability and choose the correct defense.

Q1. A web app takes a search query from the URL and displays it on the results page: "Results for: " + query

Q2. A file download endpoint uses the filename from the request: open_file("/uploads/" + filename)

Q3. A login page redirects users to a "return URL" after login: redirect(get_param("next"))

Q4. An admin panel lets operators run diagnostics: exec_cmd("ping " + host)

Answer all scenarios and click "Check All"

Quiz 1: Concept Check

Test your understanding of taint analysis fundamentals.

Q1. What does join Tainted Untainted equal?

Q2. Why does the taint lattice NOT need widening?

Q3. What does is_potentially_tainted return true for?

Quiz 2: Trace the Taint

Trace taint through branching code with implicit flows.

token = read_cookie("auth")
role = "guest"
if token == "admin_key":
role = "admin"
msg = "Welcome, " + role
send_response(msg)

With pc_taint tracking:

role at line 5: msg at line 5: Vulnerability?

Fill in the answers and click "Check"

Quiz 3: Real-World Vulnerability Analysis

For each scenario, identify the full taint path and whether the code is safe.

Scenario 1:

name = get_param("name")
safe = html_encode(name)
page = "<h1>" + safe + "</h1>"
send_response(page)

Scenario 2:

id = get_param("id")
safe = html_encode(id)
query = "SELECT * WHERE id=" + safe
exec_query(query)

Scenario 3:

data = read_input()
result = mystery_func(data)
exec_query(result)

Answer all scenarios and click "Check All Answers"