Hash Tables

O(1) Average Lookup

CS205 Data Structures · Use arrow keys to navigate

The Problem: Fast Key-Value Access

We want O(1) for get, put, and remove. Arrays give O(1) by index — but what if keys are strings?

Structuregetput
Unsorted ArrayO(n)O(1)
Sorted ArrayO(log n)O(n)
Linked ListO(n)O(1)
BST (balanced)O(log n)O(log n)
Hash TableO(1)*O(1)*

* average case

Key Idea

Convert any key into an array index using a hash function, then store the value at that index. Direct access — no searching!

The Hash Function Pipeline

Two steps: hash code → compression → array index

Coat Check Analogy

You hand your coat (key-value pair) to the attendant. They give you a ticket number (the hash). When you return, they go directly to the right hook — no searching through all coats.

Two-Step Pipeline

Step 1 — Hash Code: Turn the key into a (large) integer.
Step 2 — Compression: Squeeze that integer into range [0, N-1] using mod N.

Hash Function Requirements

Three Requirements

  • Deterministic — same key always → same hash. h("alice") = 42 now and forever.
  • Uniform Distribution — spread keys evenly across indices. Avoid clustering.
  • Fast to Compute — a slow hash defeats the O(1) purpose.

Bad Hash = Linked List

A hash function mapping everything to index 0 turns the table into a linked list — O(n) for everything!

Compression Methods

Division Method (Modulo)

index = hashCode % N

Simple, fast. Works best when N is prime.

MAD Method (Multiply-Add-Divide)

index = ((a·h(k) + b) mod p) mod N

Where p is prime > N, a > 0. Better distribution — "scrambles" the hash code before compressing.

Hash Codes for Different Types

Strings: Polynomial Hash

h(s) = s[0]·x^(n-1) + s[1]·x^(n-2)
+ ... + s[n-1]·x^0
// Java uses x = 31
h("abc") = 'a'·31² + 'b'·31 + 'c'
= 97·961 + 98·31 + 99
= 96354
// Horner's method (efficient):
((97 * 31 + 98) * 31 + 99)

Why Polynomial Hashing?

  • Uses position of characters, not just content
  • "abc" and "cba" get different hashes
  • Java's String.hashCode() uses x = 31

Objects: Combine Field Hashes

class Student {
String name; int id;
int hashCode() {
int h = 17;
h = 31*h + name.hashCode();
h = 31*h + id;
return h;
}
}

equals() ↔ hashCode() Contract

If a.equals(b) is true, then a.hashCode() == b.hashCode() must be true. The reverse is not required (collisions are OK).

Compression Functions in Detail

Interactive — try different hash codes and table sizes

Simple Modulo: h(k) mod N

Why N Should Be Prime

If N=10 and keys are multiples of 5: {5,10,15,20,25,...} all map to indices {0,5} — only 2 of 10 buckets used! A prime N minimizes such patterns.

MAD Method

((a·h(k) + b) mod p) mod N

The multiply-add step "scrambles" hash codes before compression, producing better distribution.

Collisions Are Inevitable

A collision occurs when two different keys map to the same index.

Pigeonhole Principle

If you have more keys than array slots, at least two keys must share a slot. Even with fewer keys, collisions are very likely (Birthday Paradox: 23 people → 50% chance two share a birthday).

Two Main Solutions

1. Separate Chaining

Store a linked list at each bucket

2. Open Addressing

Find another open slot in the array

Collision Handling 1: Separate Chaining

Each bucket holds a linked list of all entries that hash to that index.

Operations

  • put(k,v): hash k → index i, prepend to list at bucket[i]
  • get(k): hash k → index i, search list at bucket[i]
  • remove(k): hash k → index i, remove from list at bucket[i]

Key Idea

The array itself never "fills up." Each bucket can hold unlimited entries. The tradeoff: long chains degrade to O(n) search within that chain.

Load Factor

α = n/N can exceed 1.0 with chaining. Rehash when α > 0.75.

Separate Chaining: Step-by-Step

N = 7, h(k) = k mod 7. Insert: 10, 22, 31, 4, 15

function put(k):
idx = h(k) = k mod 7
bucket[idx].prepend(k)

Collision Handling 2: Linear Probing

All entries live directly in the array. If the target slot is taken, try the next slot.

probe(k, i) = (h(k) + i) mod N
i = 0, 1, 2, 3, ...

Primary Clustering

Occupied slots form long contiguous runs. New keys hashing anywhere in the cluster must probe to its end, making it even longer.

Key Idea

Simple and cache-friendly, but clustering degrades performance as the table fills. Keep α ≤ 0.5 for best results.

Linear Probing: Step-by-Step

N = 7, h(k) = k mod 7. Insert: 10, 22, 31, 4, 15

idx = h(k) = k mod 7
while table[idx] occupied:
idx = (idx + 1) mod 7
table[idx] = k

Open Addressing: Quadratic Probing

Probe at increasing squared offsets to break up clusters.

N = 11 (prime)
probe(k, i) = (h(k) + i²) mod N
i = 0, 1, 2, 3, ...
Offsets: +0, +1, +4, +9, +16, +25…

Key Idea

Quadratic probing jumps farther each time, breaking primary clusters. But keys with the same hash still follow the same probe sequence (secondary clustering).

Warning

May not visit all slots. Guaranteed to work when N is prime and table is less than half full.

Open Addressing: Double Hashing

A second hash function determines the step size — each key gets a unique probe sequence.

N=11, q=7
probe(k,i) = (h1(k) + i·h2(k)) mod N
h1(k) = k mod N
h2(k) = q - (k mod q)
where q is prime < N

Probing Comparison

MethodPrimarySecondary
Linear✗ Yes✗ Yes
Quadratic✓ No✗ Yes
Double Hash✓ No✓ No

Key Idea

Keys 20 & 31 both hash to index 9, but step sizes differ (1 vs 4) — completely different probe paths!

Deletion in Open Addressing: Tombstones

You cannot simply empty a slot — it breaks the probe chain for other keys!

Tombstone (DEL) Marker

A DELETED marker means: "a key was here; keep probing."

  • Search: treat DEL as occupied → keep probing
  • Insert: treat DEL as empty → reuse the slot

The Problem

Naively emptying a slot creates a "gap" that stops probe chains short, making existing keys unfindable.

Try It!

Insert 10, 31 (both hash to 3). Delete 10. Search for 31 — the tombstone keeps the chain alive!

Load Factor & Performance

The load factor α = n/N controls how full the table is — and how fast it runs.

0.50

Definition

α = n / N

n = entries stored, N = table size

Expected Probes (Linear Probing)

αSuccessfulUnsuccessful
0.251.171.39
0.501.502.50
0.752.508.50
0.905.5050.50

Rehash Thresholds

  • Chaining: rehash when α > 0.75
  • Open addressing: rehash when α > 0.5

Rehashing

When α exceeds the threshold, grow the table and reinsert everything.

Key Idea

Rehashing costs O(n) for that one operation, but happens rarely. Amortized cost per insert remains O(1) — same idea as dynamic array doubling.

Warning

You must recompute all indices because N changed. Old indices are no longer valid. You cannot just copy entries to the same slots!

Time Complexity Analysis

Average Case (good hash, low α)

OpChainingOpen Addr.
putO(1)O(1)
getO(1)O(1)
removeO(1)O(1)

Worst Case (all collide)

OpChainingOpen Addr.
putO(n)O(n)
getO(n)O(n)
removeO(n)O(n)

Analogy

A good hash function assigns students to exam rooms evenly. A bad one puts everyone in Room 1 and leaves Rooms 2–10 empty.

Challenge: Trace Hash Table Operations

Linear probing, N = 7, h(k) = k mod 7. Trace these operations and fill in the final table:

1. insert(14)   // 14 mod 7 = ?
2. insert(21)   // 21 mod 7 = ?
3. insert(7)    // 7 mod 7 = ?
4. insert(3)    // collision?
5. insert(10)   // collision?
6. delete(21)
7. search(10)   // found?

Your Answer — Final Table State:

Java's HashMap Internals

Key Details

  • Initial capacity: 16 (always power of 2)
  • Load factor: 0.75 by default
  • Rehash: double when entries > capacity × 0.75
  • Chain < 8: linked list — O(n) within chain
  • Chain ≥ 8: Red-Black Tree — O(log n)
  • Untreeify: back to list when chain < 6
  • Index: (n-1) & hash (bitwise AND = fast mod)

Key Idea

Java 8+ treeifies long chains, guaranteeing O(log n) worst case per bucket — protection against hash-flooding attacks.

Challenge: Fix the Bug

This linear probing search has a bug. Can you spot it?

boolean search(int key) {
int idx = key % N;
while (table[idx] != null) {
if (table[idx] == key)
return true;
idx = (idx + 1) % N;
}
return false;
}

Real-World Applications

More Uses

  • Compilers: symbol tables (name → type, scope)
  • Networking: routing tables, DNS caches
  • Deduplication: detect duplicate files by content hash
  • Blockchain: transaction verification via hash chains

Challenge: Pick the Right Strategy

For each scenario, choose the best collision handling approach.

1. Memory-constrained embedded system

Limited RAM, cache-friendly access is crucial.

2. Web server handling frequent deletions

Session cache where entries expire/get removed often.

3. Database with unpredictable key patterns

Keys may cluster; must avoid both primary & secondary clustering.

4. Hash table that may have α > 1.0

More entries than buckets is acceptable.

Summary & Cheat Sheet

5 Things to Remember

  1. Hash function: deterministic, uniform, fast
  2. Collisions: inevitable (pigeonhole) — must handle them
  3. Load factor α = n/N — rehash before too high
  4. Tombstones: needed for deletion in open addressing
  5. Average O(1) for all ops — best for unsorted data

Final Analogy

A hash table is a filing cabinet. The hash function labels which drawer to open. When two files share a drawer (collision), stack them (chaining) or find the next empty drawer (open addressing). When it's too full, buy a bigger cabinet and refile everything (rehash).

Quiz: Hash Table Fundamentals

Q1: Time Complexity

What is the average time for get() in a well-designed hash table?

Q2: Load Factor

A hash table has 12 entries and 16 buckets. What is α?

Q3: Tombstones

Why can't you simply empty a slot when deleting in open addressing?

Quiz: Predict the Probe Sequence

Quadratic probing, N = 11. Table has keys at slots: [1]=22, [3]=36, [4]=47, [5]=58.

Where does insert(69) land? (h(69) = 69 mod 11 = 3)

Probe formula: (h(k) + i²) mod 11
i=0: (3 + 0) mod 11 = 3 → ?
i=1: (3 + 1) mod 11 = 4 → ?
i=2: (3 + 4) mod 11 = 7 → ?
i=3: (3 + 9) mod 11 = 1 → ?

Quiz: Predict the Output

HashMap<String,Integer> map = new HashMap<>();
map.put("alice", 90);
map.put("bob", 85);
map.put("alice", 95); // what happens?
System.out.println(map.get("alice"));
System.out.println(map.size());
System.out.println(map.containsValue(90));
map.remove("bob");
System.out.println(map.get("bob"));

Your Predictions:

Line 5 output: Line 6 output: Line 7 output: Line 9 output: