DSA Lab | Union-Find (Disjoint Set)

What Is Union-Find?

Imagine you're at a party where people form friend groups. Two people are in the same group if they're friends, or friends-of-friends, or friends-of-friends-of-friends — connected through any chain of friendships. Now imagine someone keeps introducing people: "Alice, meet Bob." Their groups merge. Later, someone asks: "Are Charlie and Dave in the same group?" You need to answer fast.

Union-Find (also called Disjoint Set Union or DSU) is a data structure built for exactly this scenario. It maintains a collection of disjoint sets — non-overlapping groups where each element belongs to exactly one set. It supports two operations:

Union(a, b) — merge the sets containing elements a and b into a single set
Find(a) — figure out which set element a belongs to (returns the representative, also called the root, of that set — a designated element that identifies the group)

That's it. Two operations. But they're enough to solve a surprising number of graph connectivity problems (questions about which nodes can reach which other nodes through some chain of connections).

A Brief History

The structure was first described by Bernard Galler and Michael Fischer in 1964. The two key optimizations — union by rank and path compression — were analyzed by Robert Tarjan in 1975, who proved the now-famous inverse Ackermann amortized bound. It took decades for mathematicians to prove this was actually optimal — Fredman and Saks showed in 1989 that no data structure can do better for this problem.

The beauty of Union-Find is that it solves an important problem (dynamic connectivity) with an absurdly efficient solution. The theoretical bound is so close to constant time that the distinction is meaningless in practice.

Where Is It Used?

Kruskal's algorithm — building minimum spanning trees (the cheapest set of edges that connects all vertices in a weighted graph). For each edge in sorted order, Union-Find checks if adding it would create a cycle (a closed loop in the graph). If the two endpoints are already connected (same set), skip it; otherwise, union them and add the edge.
Network connectivity — are two computers on the same network? As network links come online one at a time, Union-Find tracks which machines can communicate.
Image processing — finding connected components (contiguous regions of same-colored pixels). Label each pixel, then union adjacent pixels that share the same color.
Social networks — determining if two users are in the same friend cluster. Facebook's early infrastructure used Union-Find-like structures for this.
Equivalence classes — grouping items that are "the same" by some relation. If A≡B and B≡C, then A≡C. Union-Find handles the transitivity automatically.
Percolation theory — does water flow from the top of a grid to the bottom? Model the grid with Union-Find and check if any top-row cell is connected to any bottom-row cell.
Compilers — type inference engines use Union-Find to unify type variables. When the compiler figures out that type T1 must equal type T2, it unions them.

How It Works

Each set is represented as a tree (not a binary tree — just a general tree where each node has one parent). Every element points to a parent, and the root (the element that points to itself) is the set's representative. When you call Find(x), you walk up the parent pointers until you reach the root.

Two elements are in the same set if and only if they have the same root. To merge two sets, just make one root point to the other.

The naive version is simple but slow — trees can degenerate into long chains (imagine unioning 1→2, then 2→3, then 3→4... you get a linked list). Two optimizations fix this:

Union by Rank

When merging two sets, attach the shorter tree under the root of the taller tree. This keeps trees shallow. The rank is an upper bound on the tree's height — it starts at 0 and only increases when you merge two trees of the same rank.

Without this optimization, you could end up with a tree that's basically a linked list — O(n) per Find. With union by rank alone, tree height stays O(log n).

Why "rank" instead of "size"? You can actually use either. Union by size (always attach the smaller set under the larger one) gives the same O(log n) bound. Some people find size more intuitive since it tracks something concrete (number of elements). Rank and size are interchangeable for correctness; the performance difference is negligible.

Path Compression

When you call Find(x), you walk from x up to the root. Path compression says: while you're at it, make every node along that path point directly to the root. Next time anyone calls Find on those nodes, it's a single hop instead of a chain.

This is a lazy optimization — you don't proactively flatten the tree, you just fix paths as you traverse them. Over time, the tree gets flatter and flatter with each query.

Combine both optimizations, and each operation runs in O(α(n)) amortized (averaged over many operations) time — where α is the inverse Ackermann function. This function grows so absurdly slowly that for any practical input size (up to 10⁸⁰ — more atoms than exist in the observable universe), α(n) ≤ 4. So it's effectively constant time. You'll never encounter a real-world input where α(n) = 5.

Step-by-Step Example

Start with 5 elements: {0}, {1}, {2}, {3}, {4}. Each is its own set.

Union(0, 1): 0 becomes parent of 1 (or vice versa). Now: {0,1}, {2}, {3}, {4}.
Union(2, 3): 2 becomes parent of 3. Now: {0,1}, {2,3}, {4}.
Union(1, 3): Find(1)=0, Find(3)=2. Union roots: 0 becomes parent of 2. Now: {0,1,2,3}, {4}.
Find(3): Walk 3→2→0. Path compression: 3 now points directly to 0. Next Find(3) is O(1).
Connected(1, 4): Find(1)=0, Find(4)=4. Different roots → not connected.

Interactive Visualization

Click nodes to select them, then use the buttons below. Watch how path compression flattens trees and union by rank keeps them balanced.

Union-Find Explorer

Operations & Complexity

Operation	Naive	Union by Rank	Rank + Path Compression
MakeSet	O(1)	O(1)	O(1)
Find	O(n)	O(log n)	O(α(n)) ≈ O(1)
Union	O(n)	O(log n)	O(α(n)) ≈ O(1)
Connected	O(n)	O(log n)	O(α(n)) ≈ O(1)

Space: O(n) — one parent pointer and one rank (or size) value per element. That's it. No extra overhead.

α(n) is the inverse Ackermann function. The Ackermann function grows faster than any primitive recursive function — it makes exponential functions look slow. Its inverse, naturally, grows slower than any function you've ever seen. For n = 10⁸⁰, α(n) = 4. For all practical purposes, it's a constant.

Implementation

Python — Union by Rank + Path Compression

Python

class UnionFind:
    """Disjoint Set Union with union by rank + path compression.
    Supports: find, union, connected, component count, component size."""

    def __init__(self, n):
        # Each element starts as its own set (self-loop: parent[i] = i)
        self.parent = list(range(n))
        # Rank tracks upper bound on tree height (not exact height after compression)
        self.rank = [0] * n
        # Track size of each component (useful for many problems)
        self.size = [1] * n
        # Total number of distinct components
        self.components = n

    def find(self, x):
        """Find the root (representative) of x's set.
        Uses path compression: every node on the path to root
        gets directly linked to root. Next find is O(1) for those nodes."""
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Recursive path compression
        return self.parent[x]

    def union(self, x, y):
        """Merge the sets containing x and y.
        Uses union by rank: shorter tree goes under taller tree.
        Returns True if a merge actually happened, False if already same set."""
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False  # Already in the same set — nothing to do

        # Union by rank: attach shorter tree under taller tree
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx  # Ensure rx is the taller (or equal) root
        self.parent[ry] = rx   # ry's tree goes under rx
        self.size[rx] += self.size[ry]  # Update component size

        # Only increase rank when merging two trees of equal rank
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1

        self.components -= 1
        return True

    def connected(self, x, y):
        """Check if x and y are in the same set. O(α(n))."""
        return self.find(x) == self.find(y)

    def get_size(self, x):
        """Get the size of the component containing x."""
        return self.size[self.find(x)]


# Usage
uf = UnionFind(10)
uf.union(1, 2)
uf.union(3, 4)
uf.union(2, 4)                    # Merges {1,2} and {3,4} → {1,2,3,4}
print(uf.connected(1, 3))         # True — connected through 2 and 4
print(uf.connected(1, 5))         # False — 5 is still alone
print(uf.components)              # 7 — started with 10, 3 unions happened
print(uf.get_size(1))             # 4 — component {1,2,3,4} has 4 elements

Python — Union by Size Variant (Alternative)

Python

class UnionFindBySize:
    """Alternative: union by SIZE instead of rank.
    Some people find this more intuitive since size has a concrete meaning.
    The performance guarantees are the same: O(α(n)) amortized."""

    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n   # Size doubles as the union heuristic
        self.components = n

    def find(self, x):
        # Iterative path compression — avoids recursion depth issues
        # on very large inputs (Python has a default recursion limit of 1000)
        root = x
        while self.parent[root] != root:
            root = self.parent[root]
        # Now flatten: make every node on the path point to root
        while self.parent[x] != root:
            next_parent = self.parent[x]
            self.parent[x] = root
            x = next_parent
        return root

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False

        # Always attach SMALLER tree under LARGER tree
        if self.size[rx] < self.size[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        self.size[rx] += self.size[ry]
        self.components -= 1
        return True

    def connected(self, x, y):
        return self.find(x) == self.find(y)

JavaScript

class UnionFind {
  constructor(n) {
    this.parent = Array.from({ length: n }, (_, i) => i);
    this.rank = new Array(n).fill(0);
    this.size = new Array(n).fill(1);
    this.components = n;
  }

  find(x) {
    // Iterative path compression — no stack overflow risk
    let root = x;
    while (this.parent[root] !== root) root = this.parent[root];
    // Flatten path
    while (this.parent[x] !== root) {
      const next = this.parent[x];
      this.parent[x] = root;
      x = next;
    }
    return root;
  }

  union(x, y) {
    let rx = this.find(x), ry = this.find(y);
    if (rx === ry) return false;

    // Union by rank
    if (this.rank[rx] < this.rank[ry]) [rx, ry] = [ry, rx];
    this.parent[ry] = rx;
    this.size[rx] += this.size[ry];
    if (this.rank[rx] === this.rank[ry]) this.rank[rx]++;

    this.components--;
    return true;
  }

  connected(x, y) {
    return this.find(x) === this.find(y);
  }

  getSize(x) {
    return this.size[this.find(x)];
  }
}

Common Mistakes

⚠️ Beginner Pitfalls with Union-Find

Forgetting path compression: Without it, find() degrades to O(n) in the worst case. Always include it — there's no reason not to. The recursive version is one extra line; the iterative version is a small while loop.
Unioning elements instead of roots: parent[x] = y is wrong. You must find the roots first: parent[find(x)] = find(y). Skipping the find step breaks the tree structure completely.
Not using union by rank/size: Without it, you can get O(n) height trees. Both rank and size prevent this. Pick one. If you need component sizes for the problem, use union by size since you're tracking it anyway.
Off-by-one in component count: Union returns whether a merge actually happened. Only decrement the component count when union returns true (the elements were in different sets). If they were already connected, the count stays the same.
Using Union-Find for directed graphs: Union-Find tracks undirected connectivity. It can't distinguish between "A can reach B" and "B can reach A" in a directed graph. For directed connectivity, you need a different approach (like DFS from each node, or Tarjan's SCC algorithm).

Real-World Example: Kruskal's Minimum Spanning Tree

A minimum spanning tree (MST) is the cheapest set of edges that connects all vertices in a weighted graph — no cycles, minimum total weight. Kruskal's algorithm builds the MST greedily: sort all edges by weight, then add edges one by one, skipping any that would create a cycle. Union-Find makes the cycle check trivial.

Python

def kruskal_mst(num_vertices, edges):
    """Build a minimum spanning tree using Kruskal's algorithm.
    
    Args:
        num_vertices: number of vertices (0 to n-1)
        edges: list of (weight, u, v) tuples
    
    Returns:
        (total_weight, mst_edges) — the MST cost and its edges
    
    Time: O(E log E) for sorting + O(E·α(V)) for union-find operations
    The sort dominates, making it O(E log E) overall.
    """
    # Sort edges by weight — greedy: always try cheapest edge first
    edges.sort()
    
    uf = UnionFind(num_vertices)
    mst_edges = []
    total_weight = 0
    
    for weight, u, v in edges:
        # Would adding this edge create a cycle?
        if not uf.connected(u, v):
            # No cycle — add it to the MST
            uf.union(u, v)
            mst_edges.append((u, v, weight))
            total_weight += weight
        # If u and v are already connected, skip (would create cycle)
        
        # MST has exactly V-1 edges — stop early if done
        if len(mst_edges) == num_vertices - 1:
            break
    
    # Check if all vertices are connected
    if len(mst_edges) != num_vertices - 1:
        return None  # Graph is disconnected — no spanning tree exists
    
    return total_weight, mst_edges


# Example: network of cities with connection costs
#
#   0 ---4--- 1
#   |  \      |
#   8   2     6
#   |    \    |
#   3 ---1-- 2
#
edges = [
    (4, 0, 1),  # 0-1 costs 4
    (8, 0, 3),  # 0-3 costs 8
    (2, 0, 2),  # 0-2 costs 2
    (6, 1, 2),  # 1-2 costs 6
    (1, 2, 3),  # 2-3 costs 1
]

result = kruskal_mst(4, edges)
if result:
    cost, mst = result
    print(f"MST total cost: {cost}")  # 7
    for u, v, w in mst:
        print(f"  Edge {u}-{v} (weight {w})")
    # Output:
    #   Edge 2-3 (weight 1)
    #   Edge 0-2 (weight 2)
    #   Edge 0-1 (weight 4)

Kruskal's is the go-to MST algorithm when you receive the graph as an edge list (which is how most problems present it). The Union-Find operations are so cheap that the sorting step dominates the runtime. For very dense graphs, Prim's algorithm (which uses a priority queue and adjacency list) may be faster.

Interview Patterns

💡 "Can I use Union-Find here?" If the problem involves grouping elements, tracking connected components, or detecting cycles in an undirected graph — Union-Find is probably the right tool. The classic signal: you need to repeatedly ask "are these two things in the same group?" while also merging groups. If you see the words "connected," "component," "group," or "cluster" in the problem, think Union-Find.

Pattern 1: Connected Components

Given edges or equivalence pairs, count how many distinct groups exist. Process each edge with union(), then check the components counter. Classic example: "Number of Provinces" (LC 547) — you're given a matrix of friendships and need to count friend clusters.

Pattern 2: Cycle Detection (Undirected)

For each edge (u, v) in an undirected graph: if find(u) == find(v) before the union, adding this edge would create a cycle — both endpoints are already connected. That's how Kruskal's MST algorithm avoids cycles, and it's the core idea behind "Redundant Connection" (LC 684).

Pattern 3: Dynamic Connectivity

When edges arrive one at a time and you need to track when components merge. "At what point does node A become connected to node B?" Process edges in order, union as you go, and check connectivity after each union.

💡 Size Tracking Variant Sometimes you need the size of each component, not just connectivity. Add a size[] array: when merging, add the smaller set's size to the larger one. This lets you answer "how many elements are in this group?" in O(1). Problems like "Largest Component Size by Common Factor" (LC 952) need this.

Pattern 4: Grid Problems

Problems on 2D grids (like "Number of Islands") can use Union-Find instead of DFS/BFS. Map each cell (r, c) to index r * cols + c, then union adjacent land cells. This is especially useful when islands are added dynamically ("Number of Islands II" — LC 305), where DFS/BFS would need to re-explore from scratch but Union-Find handles updates incrementally.

💡 Pattern: Union-Find with Weighted Edges For "Evaluate Division" (LC 399), you can model equations as a weighted union-find where each edge has a ratio. If a/b = 2.0, union a and b with weight 2.0. To evaluate a/c, find their roots and use the weights along the path. This is trickier but elegant.

💡 Pattern: Process Equals First, Then Check Inequalities "Satisfiability of Equality Equations" (LC 990): given equations like a==b and a!=c, check if they're consistent. First pass: union all variables connected by ==. Second pass: for each != equation, check if both sides have the same root — if so, it's a contradiction.

Practice Problems

Problem	Difficulty	Key Idea
547. Number of Provinces	Medium	Basic connected components — union each friendship, count groups
684. Redundant Connection	Medium	Cycle detection — first edge where both endpoints share a root
305. Number of Islands II	Hard	Dynamic grid — add land cells incrementally, track component count
721. Accounts Merge	Medium	Union emails belonging to same person, then group by root
1202. Smallest String With Swaps	Medium	Group indices by swap pairs, sort within each group independently
1319. Number of Operations to Make Network Connected	Medium	Count components — need (components - 1) extra edges to connect them all
990. Satisfiability of Equality Equations	Medium	Union equals first, then check if any inequality contradicts

Union-Find (Disjoint Set)

On This Page

What Is Union-Find?

A Brief History

Where Is It Used?

How It Works

Union by Rank

Path Compression

Step-by-Step Example

Interactive Visualization

Operations & Complexity

Implementation

Python — Union by Rank + Path Compression

Python — Union by Size Variant (Alternative)

JavaScript

Common Mistakes

Real-World Example: Kruskal's Minimum Spanning Tree

Interview Patterns

Pattern 1: Connected Components

Pattern 2: Cycle Detection (Undirected)

Pattern 3: Dynamic Connectivity

Pattern 4: Grid Problems

Practice Problems