DSA Lab | Heaps

What Is a Heap?

A heap is a complete binary tree (every level is fully filled except possibly the last, which fills left to right — no gaps) stored in an array. "Complete" is a strict shape requirement — it's what lets us skip pointers entirely and use array index math instead.

The defining property: every parent node satisfies a specific ordering rule with its children.

Min-Heap: Every parent ≤ both children. The root is the smallest element in the entire tree.
Max-Heap: Every parent ≥ both children. The root is the largest element.

That's it. Siblings have no ordering between them. Left child isn't necessarily smaller than right child. The only guarantee flows vertically — parent to child.

Think of it like a corporate hierarchy where only one rule exists: every manager earns at least as much as their direct reports. The CEO (root) earns the most. But two people at the same level might earn wildly different amounts — the heap doesn't care about horizontal comparisons.

A Brief History

The binary heap was introduced by J.W.J. Williams in 1964 as part of the heapsort algorithm. Robert Floyd improved the build-heap operation to O(n) later that same year. The idea was motivated by a simple question: how do you repeatedly extract the minimum element from a collection as fast as possible?

Sorted arrays give O(1) extraction but O(n) insertion (shifting elements). Unsorted arrays give O(1) insertion but O(n) extraction (scanning for the min). The heap gives O(log n) for both — a balanced compromise that's ideal for priority-based processing.

Since then, more exotic heap variants have appeared: Fibonacci heaps (1984, by Fredman and Tarjan) give O(1) amortized insert and decrease-key, making Dijkstra's algorithm theoretically faster. Binomial heaps support efficient merge. But in practice, the binary heap's simplicity and cache-friendliness (it's just an array!) make it the standard choice.

Why Should You Care?

Heaps give you O(1) access to the min or max element. Inserting or removing that element costs O(log n). That combination makes them the backbone of:

Priority queues — task schedulers, CPU job scheduling, Dijkstra's shortest path, A* pathfinding
Heap sort — O(n log n) guaranteed, in-place, no extra memory. Worse cache behavior than quicksort in practice, but has a firm worst-case guarantee that quicksort lacks.
Median maintenance — two heaps tracking the running median of a data stream
Top-K problems — finding the K largest/smallest elements without sorting everything
Merge K sorted lists — classic interview problem, heaps make it clean
Event-driven simulation — process the nearest future event first
Huffman coding — the compression algorithm used in ZIP, JPEG, and MP3 files builds a tree by repeatedly extracting the two smallest frequencies from a min-heap

Heap vs. Binary Search Tree

A BST keeps all elements ordered — you can find any element in O(log n), iterate in sorted order, and answer range queries. A heap only guarantees fast access to one element: the min or max. But heaps are simpler, use arrays (better cache performance — the CPU prefetches sequential memory), and the constant factors are smaller. When you only need "give me the extreme value," heaps win.

Put differently: BSTs are for searching, heaps are for scheduling.

How It Works

The Array Trick

Here's the clever part. Because a heap is a complete binary tree (no gaps in the shape), you don't need node objects with left/right pointers. You just pack nodes into an array level by level, left to right. For any node at index i (0-indexed):

Left child: 2i + 1
Right child: 2i + 2
Parent: ⌊(i - 1) / 2⌋

No wasted space. No pointer overhead (which would be 8-16 bytes per node on a 64-bit system). Just simple arithmetic. The root is at index 0. Its left child is at index 1. Its right child is at index 2. Their children are at indices 3, 4, 5, 6. And so on.

This layout also gives heaps great cache locality — parents and children are close together in memory, so the CPU's cache prefetcher loads them efficiently. This is a significant practical advantage over pointer-based tree structures.

Bubble Up (Insertion)

When you insert a value:

Drop it at the end of the array (the next available leaf position — bottom-right of the tree).
Compare it with its parent using the index formula: parent = (i - 1) // 2.
If the heap property is violated — the new node is smaller than its parent in a min-heap — swap them.
Repeat from the new position until the node settles (either it reaches the root or its parent is smaller).

Worst case: the new element bubbles all the way to the root. That's the tree's height: O(log n). Best case (the element is already in the right place): O(1). On average, about half the elements will bubble up only 1-2 levels.

Sift Down (Extraction)

Extracting the root (the min or max) is where things get interesting. You can't just remove the root and leave a hole — that would break the complete tree shape. Instead:

Save the root value (that's your answer).
Move the last element in the array to the root position. This keeps the tree complete.
Sift it down: compare with both children, swap with the smaller child (min-heap) or larger child (max-heap). Why the smaller/larger? Because after the swap, that child becomes the parent, and it needs to satisfy the heap property with its new sibling (the child you didn't swap with).
Repeat until the heap property is restored or you reach a leaf.

Heapify (Build from Array)

Given an unsorted array, you can turn it into a heap in O(n) time — not O(n log n) as you might expect from calling insert n times.

The trick: start from the last non-leaf node (index n//2 - 1) and sift down each node. Leaves (the bottom half of the array) are already valid heaps because they're single-node trees. So you only need to fix the top half, and most of those nodes are near the bottom and only sift a short distance.

The math: about n/2 nodes sift 0 levels, n/4 nodes sift 1 level, n/8 sift 2 levels, etc. The sum is n × (1/4 + 2/8 + 3/16 + ...) which converges to O(n). This is one of the more elegant results in algorithm analysis.

Heap Sort

Heap sort works in two phases:

Build a max-heap from the array: O(n).
Repeatedly extract the max (swap root with the last unsorted element, shrink the heap by 1, sift down): O(n log n) total for n extractions.

Total: O(n log n). In-place (O(1) extra space). Worst-case guaranteed. The downside: it's about 2-3x slower than quicksort in practice because of poor cache behavior (sift-down jumps around the array) and branch mispredictions.

Interactive Visualization

Watch how the heap maintains its shape and ordering. Insert values, extract the minimum, or build a heap from scratch. The tree view and array view stay synchronized — same data, two perspectives.

Binary Min-Heap

Operations & Complexity

Operation	Time (Avg)	Time (Worst)	Notes
Find Min/Max	O(1)	O(1)	Just read the root (index 0)
Insert	O(log n)	O(log n)	Bubble up to root at worst
Extract Min/Max	O(log n)	O(log n)	Sift down from root
Build Heap (Heapify)	O(n)	O(n)	Bottom-up sift-down is linear
Search	O(n)	O(n)	No ordering beyond parent-child
Delete Arbitrary	O(n)	O(n)	O(n) to find + O(log n) to fix
Decrease/Increase Key	O(log n)	O(log n)	Bubble up or sift down from known position
Heap Sort	O(n log n)	O(n log n)	Build heap + n extractions

Space complexity: O(n) for the array. No extra pointers needed — the parent-child relationships are encoded in the indices.

Implementation

Python (Min-Heap from Scratch)

Python

class MinHeap:
    def __init__(self):
        self.data = []

    def _parent(self, i): return (i - 1) // 2
    def _left(self, i):   return 2 * i + 1
    def _right(self, i):  return 2 * i + 2

    def insert(self, val):
        """Add value and restore heap property by bubbling up."""
        self.data.append(val)
        self._bubble_up(len(self.data) - 1)

    def extract_min(self):
        """Remove and return the minimum (root).
        
        Swap root with last element, remove last, sift down.
        This keeps the tree complete after removal.
        """
        if not self.data:
            raise IndexError("empty heap")
        self.data[0], self.data[-1] = self.data[-1], self.data[0]
        minimum = self.data.pop()
        if self.data:
            self._sift_down(0)
        return minimum

    def peek(self):
        """Look at the minimum without removing it. O(1)."""
        if not self.data:
            raise IndexError("empty heap")
        return self.data[0]

    def _bubble_up(self, i):
        """Swap with parent while the heap property is violated."""
        while i > 0 and self.data[i] < self.data[self._parent(i)]:
            p = self._parent(i)
            self.data[i], self.data[p] = self.data[p], self.data[i]
            i = p

    def _sift_down(self, i):
        """Swap with the smaller child until heap property holds."""
        n = len(self.data)
        while True:
            smallest = i
            l, r = self._left(i), self._right(i)
            # Find smallest among node and its children
            if l < n and self.data[l] < self.data[smallest]:
                smallest = l
            if r < n and self.data[r] < self.data[smallest]:
                smallest = r
            if smallest == i:
                break  # Heap property restored
            self.data[i], self.data[smallest] = self.data[smallest], self.data[i]
            i = smallest

    def __len__(self):
        return len(self.data)

    @staticmethod
    def heapify(arr):
        """Build a min-heap in-place in O(n) time.
        
        Start from last non-leaf and sift down each node.
        Leaves are already valid heaps (single nodes).
        """
        heap = MinHeap()
        heap.data = arr
        for i in range(len(arr) // 2 - 1, -1, -1):
            heap._sift_down(i)
        return heap

🐍 In Practice

Python's heapq module does all of this for you. It operates on a regular list — heapq.heappush(), heapq.heappop(), heapq.heapify(). It's a min-heap only. For a max-heap, negate values: push -x, pop and negate back. In interviews, mention you know the module exists, then implement from scratch if asked.

JavaScript (Min-Heap)

JavaScript

class MinHeap {
  constructor() { this.data = []; }

  insert(val) {
    this.data.push(val);
    this._bubbleUp(this.data.length - 1);
  }

  extractMin() {
    if (!this.data.length) throw new Error('empty heap');
    const min = this.data[0];
    const last = this.data.pop();
    if (this.data.length) {
      this.data[0] = last;
      this._siftDown(0);
    }
    return min;
  }

  peek() { return this.data[0]; }

  _bubbleUp(i) {
    while (i > 0) {
      const parent = Math.floor((i - 1) / 2);
      if (this.data[i] >= this.data[parent]) break;
      [this.data[i], this.data[parent]] = [this.data[parent], this.data[i]];
      i = parent;
    }
  }

  _siftDown(i) {
    const n = this.data.length;
    while (true) {
      let smallest = i;
      const l = 2 * i + 1, r = 2 * i + 2;
      if (l < n && this.data[l] < this.data[smallest]) smallest = l;
      if (r < n && this.data[r] < this.data[smallest]) smallest = r;
      if (smallest === i) break;
      [this.data[i], this.data[smallest]] = [this.data[smallest], this.data[i]];
      i = smallest;
    }
  }

  get size() { return this.data.length; }

  static heapify(arr) {
    const heap = new MinHeap();
    heap.data = arr;
    for (let i = Math.floor(arr.length / 2) - 1; i >= 0; i--) {
      heap._siftDown(i);
    }
    return heap;
  }
}

Variation: Heap Sort (In-Place)

Heap sort turns any array into a sorted array in O(n log n) time using O(1) extra space. It works by building a max-heap, then repeatedly swapping the root (maximum) to the end and shrinking the heap.

Python

def heap_sort(arr):
    """Sort an array in-place using heap sort.
    
    Phase 1: Build a max-heap from the array (O(n))
    Phase 2: Repeatedly extract the max to the end (O(n log n))
    
    Total: O(n log n) worst case. O(1) extra space.
    Not stable (equal elements might swap order).
    """
    n = len(arr)

    def sift_down(start, end):
        """Sift element at 'start' down within heap of size 'end'."""
        i = start
        while True:
            largest = i
            l, r = 2 * i + 1, 2 * i + 2
            if l < end and arr[l] > arr[largest]:
                largest = l
            if r < end and arr[r] > arr[largest]:
                largest = r
            if largest == i:
                break
            arr[i], arr[largest] = arr[largest], arr[i]
            i = largest

    # Phase 1: Build max-heap (O(n))
    for i in range(n // 2 - 1, -1, -1):
        sift_down(i, n)

    # Phase 2: Extract max repeatedly
    # Swap root (max) with last unsorted element,
    # shrink heap by 1, fix the new root
    for end in range(n - 1, 0, -1):
        arr[0], arr[end] = arr[end], arr[0]  # max goes to its final position
        sift_down(0, end)                      # fix the smaller heap


# Demo
nums = [38, 27, 43, 3, 9, 82, 10]
heap_sort(nums)
print(nums)  # [3, 9, 10, 27, 38, 43, 82]

Variation: Max-Heap with Custom Comparator

Sometimes you need a max-heap, or a heap that orders by a custom key (like sorting tasks by deadline). Here's a flexible approach:

Python

import heapq

# Approach 1: Negate values for max-heap behavior
max_heap = []
for val in [3, 1, 4, 1, 5, 9]:
    heapq.heappush(max_heap, -val)

print(-heapq.heappop(max_heap))  # 9 (largest)
print(-heapq.heappop(max_heap))  # 5

# Approach 2: Tuple comparison for custom ordering
# Tuples compare element-by-element, so (priority, data) works
tasks = []
heapq.heappush(tasks, (2, "fix bug"))
heapq.heappush(tasks, (0, "server down"))  # highest priority
heapq.heappush(tasks, (1, "deploy feature"))

while tasks:
    priority, task = heapq.heappop(tasks)
    print(f"  [{priority}] {task}")
# [0] server down
# [1] deploy feature
# [2] fix bug

# Approach 3: Wrapper class for complex objects
class MaxHeapItem:
    """Wrapper that reverses comparison for max-heap behavior."""
    def __init__(self, val):
        self.val = val
    def __lt__(self, other):
        return self.val > other.val  # reversed!
    def __repr__(self):
        return f"MaxHeapItem({self.val})"

max_pq = []
for x in [10, 30, 20]:
    heapq.heappush(max_pq, MaxHeapItem(x))
print(heapq.heappop(max_pq).val)  # 30

Common Mistakes

⚠️ Mistakes That Trip Up Beginners

Treating a heap as a sorted array. A heap is NOT sorted. The only guarantee is that the root is the min (or max). The second-smallest element could be at index 1 or index 2 — you don't know without checking both. To get sorted order, you need to extract all elements one by one (that's heap sort).
Forgetting that Python's heapq is min-only. There's no built-in max-heap in Python. Negate values (-x) for max-heap behavior. Or use a wrapper class that reverses the comparison operator. Forgetting this leads to getting the smallest element when you wanted the largest.
Using a heap when a simpler structure works. If you only need the overall min/max once (not repeatedly), just scan the array in O(n). Heaps shine when you're doing repeated insert + extract-min operations. For a single query, they're overkill.
Getting the index arithmetic wrong. For 0-indexed arrays: children of i are at 2i+1 and 2i+2. Parent of i is at (i-1)//2. Some textbooks use 1-indexed arrays (children at 2i and 2i+1, parent at i//2). Mixing conventions is a recipe for off-by-one bugs.
Not handling equal elements correctly in sift-down. When both children have the same value, you should still swap with the correct child based on the heap type. Some implementations accidentally swap with the wrong one, breaking the heap property. Always compare against the smallest (min-heap) or largest (max-heap) child.

Real-World Applications

1. Running Median (Streaming Data)

Streaming services, financial platforms, and sensor networks need to compute the median (the middle value of a dataset) continuously as new data arrives. Sorting after every insertion would be O(n log n) per element. Two heaps give you O(log n) per element and O(1) median access.

Python

import heapq

class RunningMedian:
    """Track the median of a data stream in O(log n) per insert.
    
    Two heaps split the data in half:
    - max_heap: holds the SMALLER half (we want the largest of the small)
    - min_heap: holds the LARGER half (we want the smallest of the large)
    
    The median is either:
    - The top of max_heap (if it has more elements), or
    - The average of both tops (if sizes are equal)
    """
    def __init__(self):
        self.small = []  # max-heap (negate values)
        self.large = []  # min-heap

    def add(self, num):
        heapq.heappush(self.small, -num)

        if self.small and self.large and (-self.small[0] > self.large[0]):
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)

        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small):
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)

    def median(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        return (-self.small[0] + self.large[0]) / 2


tracker = RunningMedian()
prices = [100, 102, 98, 105, 97, 110, 95, 108]
for price in prices:
    tracker.add(price)
    print(f"  Price: ${price:>3}  |  Median: ${tracker.median():.1f}")

This two-heap median pattern is a classic interview problem (LC #295) and a real production technique. Apache Flink and Apache Spark both offer streaming median/percentile calculations built on this principle.

2. Task Scheduler / Priority Queue

Operating systems, job schedulers, and message brokers all use heaps to decide what to process next. Every time a CPU core becomes free, it pops the highest-priority job from the heap. When new jobs arrive, they're pushed in with their priority. This is why heaps are sometimes called "priority queues" — they're the engine underneath.

Python

import heapq
import time

class TaskScheduler:
    """Simple priority-based task scheduler using a min-heap.
    
    Lower priority number = higher urgency (0 = critical, 5 = low).
    When priorities tie, tasks are processed in FIFO order
    (the counter breaks ties so heapq never compares Task objects).
    """
    def __init__(self):
        self.queue = []       # Min-heap of (priority, counter, task_name)
        self.counter = 0      # Tie-breaker for equal priorities

    def add_task(self, task_name, priority=3):
        heapq.heappush(self.queue, (priority, self.counter, task_name))
        self.counter += 1

    def next_task(self):
        if not self.queue:
            return None
        priority, _, task_name = heapq.heappop(self.queue)
        return task_name

    def __len__(self):
        return len(self.queue)


scheduler = TaskScheduler()
scheduler.add_task("send weekly report", priority=4)    # Low priority
scheduler.add_task("database backup", priority=1)        # High priority
scheduler.add_task("process payment", priority=0)        # Critical
scheduler.add_task("resize image thumbnails", priority=3) # Normal
scheduler.add_task("send welcome email", priority=2)     # Medium

while scheduler:
    print(f"  Processing: {scheduler.next_task()}")
# Processing: process payment
# Processing: database backup
# Processing: send welcome email
# Processing: resize image thumbnails
# Processing: send weekly report

3. Dijkstra's Shortest Path

The most famous graph algorithm uses a min-heap to always expand the closest unvisited node. Without a heap, Dijkstra's runs in O(V²) because you'd scan all vertices to find the minimum. With a binary heap, it runs in O((V + E) log V). With a Fibonacci heap, O(E + V log V) — though the constant factors make it slower in practice for most graph sizes.

Python

import heapq

def dijkstra(graph, start):
    """Find shortest paths from start to all reachable nodes.
    
    graph: adjacency list {node: [(neighbor, weight), ...]}
    Returns: dict of {node: shortest_distance}
    
    The heap always gives us the unvisited node with the smallest
    known distance — exactly what Dijkstra's algorithm needs.
    """
    dist = {start: 0}
    heap = [(0, start)]    # (distance, node)
    visited = set()

    while heap:
        d, u = heapq.heappop(heap)

        if u in visited:
            continue         # Already found a shorter path to u
        visited.add(u)

        for neighbor, weight in graph.get(u, []):
            new_dist = d + weight
            if new_dist < dist.get(neighbor, float('inf')):
                dist[neighbor] = new_dist
                heapq.heappush(heap, (new_dist, neighbor))

    return dist


# Example: road network
roads = {
    'A': [('B', 4), ('C', 2)],
    'B': [('D', 3), ('C', 1)],
    'C': [('B', 1), ('D', 5)],
    'D': []
}
print(dijkstra(roads, 'A'))
# {'A': 0, 'C': 2, 'B': 3, 'D': 6}

4. Merge K Sorted Streams

Databases like Cassandra and LevelDB store data in sorted files (SSTables). During compaction, they need to merge many sorted files into one. A min-heap of size K — holding the current smallest element from each file — produces the merged output in O(n log k) time, where n is total elements and k is the number of files. Python's heapq.merge() does exactly this.

Python

import heapq

def merge_k_sorted(lists):
    """Merge k sorted lists into one sorted output.
    
    Push the first element from each list into a min-heap.
    Pop the smallest, then push the next element from that list.
    The heap never has more than k elements at once.
    
    Time: O(n log k) where n = total elements, k = number of lists
    Space: O(k) for the heap
    """
    heap = []
    for i, lst in enumerate(lists):
        if lst:
            # (value, list_index, element_index)
            heapq.heappush(heap, (lst[0], i, 0))

    result = []
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)

        # Push the next element from the same list
        if elem_idx + 1 < len(lists[list_idx]):
            next_val = lists[list_idx][elem_idx + 1]
            heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))

    return result


sorted_files = [
    [1, 4, 7, 10],      # SSTable 1
    [2, 5, 8, 11],      # SSTable 2
    [3, 6, 9, 12],      # SSTable 3
]
print(merge_k_sorted(sorted_files))
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

# Python's built-in does the same thing:
print(list(heapq.merge(*sorted_files)))
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Interview Patterns

💡 Pattern: Top K Elements

When a problem says "find the K largest" or "K most frequent," reach for a heap. Use a min-heap of size K — push elements in, and whenever the heap exceeds size K, pop the smallest. At the end, the heap contains exactly the K largest. This runs in O(n log k), which beats sorting at O(n log n) when k ≪ n.

The key insight: a min-heap of size K naturally evicts the smallest elements, keeping only the biggest K. Counter-intuitive? Think about it: you want the K biggest, so you throw away the small ones.

💡 Pattern: Two-Heap Median

Maintain a max-heap for the lower half and a min-heap for the upper half. Balance them so their sizes differ by at most 1. The median is always at one (or both) of the roots. This gives O(log n) insert and O(1) median access.

💡 Pattern: Merge K Sorted Things

Whether it's K sorted arrays, K sorted linked lists, or K sorted streams — push the first element of each into a min-heap. Pop the smallest, then push the next element from that same source. The heap always has at most K elements, so each operation is O(log k). Total: O(n log k) where n is the total number of elements.

💡 When to Suspect a Heap

Look for these signals in the problem statement: "kth largest/smallest," "stream of data," "merge sorted," "schedule/priority," "continuously find min/max," or "greedy with repeated min/max selection." If you see any of those, a heap is probably involved.

💡 Pattern: Greedy + Heap

Many greedy algorithms need to repeatedly pick the best available option. A heap gives you that in O(log n). Examples: Dijkstra's shortest path (pick the closest unvisited node), Huffman coding (merge the two smallest frequencies), task scheduling with cooldowns (pick the most frequent remaining task).

The pattern: make a greedy choice (extract from heap), update the state, push new options back. Repeat until done.

💡 Pattern: Lazy Deletion

Standard heaps don't support efficient deletion of arbitrary elements (O(n) to find the element). The workaround: mark the element as deleted but leave it in the heap. When you extract and find a "deleted" element, skip it and extract again. This keeps operations at O(log n) amortized. Used in: Dijkstra with duplicate entries, event schedulers with cancellations.

Practice Problems

Sorted by difficulty. Start at the top and work down.

Problem	Difficulty	Key Technique
LC 1046 — Last Stone Weight	Easy	Max-heap simulation
LC 703 — Kth Largest Element in a Stream	Easy	Min-heap of size K
LC 215 — Kth Largest Element in an Array	Medium	Heap or quickselect
LC 347 — Top K Frequent Elements	Medium	Frequency map + min-heap
LC 621 — Task Scheduler	Medium	Max-heap + cooldown queue
LC 973 — K Closest Points to Origin	Medium	Max-heap of size K
LC 295 — Find Median from Data Stream	Hard	Two-heap median pattern
LC 23 — Merge K Sorted Lists	Hard	Min-heap of K heads

Heaps

On This Page

What Is a Heap?

A Brief History

Why Should You Care?

Heap vs. Binary Search Tree

How It Works

The Array Trick

Bubble Up (Insertion)

Sift Down (Extraction)

Heapify (Build from Array)

Heap Sort

Interactive Visualization

Operations & Complexity

Implementation

Python (Min-Heap from Scratch)

JavaScript (Min-Heap)

Variation: Heap Sort (In-Place)

Variation: Max-Heap with Custom Comparator

Common Mistakes

Real-World Applications

1. Running Median (Streaming Data)

2. Task Scheduler / Priority Queue

3. Dijkstra's Shortest Path

4. Merge K Sorted Streams

Interview Patterns

Practice Problems