DSA Lab | Minimum Spanning Trees

What Is an MST?

A spanning tree of a connected graph is a subset of edges that connects every vertex without forming any cycles. It's called a "tree" because it's connected and acyclic (like a tree data structure), and it "spans" because it includes all vertices. If your graph has V vertices, any spanning tree has exactly V-1 edges — the minimum needed to keep everything connected.

A minimum spanning tree (MST) is the spanning tree with the smallest possible total edge weight. Among all the ways to connect every vertex, it's the cheapest.

Think about laying cable between buildings on a campus. Every building needs a network connection (directly or through other buildings), and you want to minimize the total cable used. Each possible cable run between two buildings has a cost (based on distance, terrain, permits). The MST gives you the cheapest wiring plan.

The concept goes back to Otakar Borůvka in 1926, who was trying to design an efficient electrical network for Moravia (now part of the Czech Republic). Kruskal published his algorithm in 1956, and Prim independently discovered his approach around the same time. These remain the two standard MST algorithms today.

Key Properties

Uniqueness: If all edge weights are distinct (no ties), the MST is unique. With duplicate weights, there may be multiple valid MSTs — all with the same total weight, but using different edges.
Cut property: For any cut (a partition of vertices into two non-empty groups), the minimum weight edge crossing the cut belongs to some MST. This is WHY greedy algorithms work for MST — at every step, taking the cheapest available edge is provably correct.
Cycle property: For any cycle in the graph, the maximum weight edge in that cycle is NOT in any MST. You'd always prefer the lighter edges in the cycle. This is the dual of the cut property.
Edge count: Always exactly V-1 edges. Any fewer and the graph is disconnected (some vertices can't reach others). Any more and it contains a cycle (and that cycle's heaviest edge is wasted).
Minimax path property: The path between any two vertices in the MST minimizes the maximum edge weight among all possible paths in the original graph. This is the "bottleneck" shortest path.

Real-World Uses

Network design — minimum-cost wiring for telecommunications, power grids, water pipes, fiber optic cable
Cluster analysis — build MST, then remove the K-1 heaviest edges to get K clusters. This is "single-linkage clustering" and it works surprisingly well for spatial data.
Approximation algorithms — The MST gives a 2-approximation for the Traveling Salesman Problem (TSP). The MST cost is a lower bound on TSP, and doubling the MST edges gives a tour at most 2× optimal.
Image segmentation — pixels as nodes, similarity as edge weights. Cutting heavy MST edges separates distinct regions.
Maze generation — random MST of a grid creates a perfect maze (exactly one path between any two cells)
Taxonomy / phylogenetics — building evolutionary trees from genetic distance data

How It Works

Kruskal's Algorithm — Edge-Centric

Kruskal's is conceptually the simplest MST algorithm. Sort all edges by weight, lightest first. Process them one by one. For each edge, ask: "Would adding this edge create a cycle?" If no, include it in the MST. If yes, skip it. Stop when you have V-1 edges.

The "would this create a cycle?" check is the key operation. If both endpoints are already in the same connected component, adding the edge creates a cycle. If they're in different components, the edge safely bridges them.

Sort all edges by weight (ascending)
Initialize a Union-Find (also called Disjoint Set Union or DSU) structure — each vertex starts as its own component
For each edge (u, v, w) in sorted order:
- If u and v are in different components: add the edge to MST, merge (union) their components
- If u and v are in the same component: skip it (would create a cycle)
Stop when you have V-1 edges (MST is complete)

💡 Why Union-Find is perfect here Union-Find (Disjoint Set Union) is a data structure that tracks which elements belong to which groups, supporting two operations: find(x) tells you which group x is in, and union(x, y) merges the groups of x and y. Checking "are u and v in the same component?" is find(u) == find(v). Merging components is union(u, v). With path compression (shortcutting find() to point directly at the root) and union by rank (attaching smaller trees under larger ones), both operations run in nearly O(1) amortized time — specifically O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any input you'll ever encounter.

Prim's Algorithm — Vertex-Centric

Prim's grows the MST outward from a starting vertex, one edge at a time. At each step, it picks the cheapest edge that connects the current tree to a vertex not yet in the tree. It's the MST equivalent of Dijkstra's algorithm (in fact, the code is nearly identical — just track edge weights instead of cumulative distances).

Start from any vertex. Mark it as "in the tree."
Add all its edges to a min-heap (priority queue where the smallest element comes out first)
While the tree has fewer than V vertices:
- Extract the minimum-weight edge from the heap
- If the edge leads to a vertex already in the tree, skip it (it would create a cycle)
- Otherwise, add the vertex and edge to the tree, push all the new vertex's edges onto the heap

Kruskal's vs Prim's — When to Use Which

Sparse graphs (E ≈ V): Kruskal's is usually faster. Sorting E edges is cheap when E is small, and Union-Find is lightweight.
Dense graphs (E ≈ V²): Prim's with an adjacency matrix runs in O(V²), beating Kruskal's O(E log E) = O(V² log V).
Already have Union-Find? Kruskal's is trivial to add on top.
Need to grow from a specific start? Prim's is the natural fit.
Edge list input? Kruskal's (just sort and go). Adjacency list/matrix input? Prim's (scan neighbors directly).
Complete graph (all pairs connected)? Prim's O(V²) beats Kruskal's O(V² log V).

Why Greedy Works

Most greedy algorithms don't give optimal results (think: making change with arbitrary coin denominations). MST is special because of the cut property: at every step, taking the minimum weight edge that crosses any cut is provably safe — it must belong to some MST. Both Kruskal's and Prim's exploit this, just in different ways. Kruskal's considers edges globally (lightest first); Prim's considers edges locally (cheapest way to expand the current tree).

Interactive Visualization

Toggle between Kruskal's and Prim's to see how each builds the same MST differently. Kruskal's sorts edges globally and picks the lightest. Prim's grows outward from a starting node. Watch the total weight accumulate as edges join the tree.

MST Construction

Common Mistakes

⚠️ MST on a directed graph MST is defined for undirected graphs. If you have a directed graph, you need a minimum spanning arborescence (Edmonds' algorithm / Chu-Liu algorithm), which is significantly more complex. If a problem gives you directed edges and asks for MST, double-check whether edges should be treated as bidirectional.

⚠️ Forgetting to check if the graph is connected If the graph is disconnected (some vertices can't reach others), no spanning tree exists — you can't span what isn't connected. Kruskal's will give you a minimum spanning forest (separate MSTs for each connected component), which might not be what the problem wants. Check: if you end up with fewer than V-1 edges, the graph isn't connected.

⚠️ Union-Find without path compression or union by rank A naive Union-Find (just a parent array) degrades to O(n) per operation on pathological inputs — trees become linked lists. Always implement both path compression (make find() point directly to root) and union by rank (attach shorter trees under taller ones). Without these, Kruskal's goes from O(E log E) to O(E·V) worst case.

⚠️ Edge weight ties: multiple valid MSTs When edges have equal weights, different processing orders give different MSTs — all with the same total weight, but different edge sets. If the problem asks "is this specific edge in THE MST?" the answer might be "it depends." Problems like "find critical and pseudo-critical edges" test this understanding explicitly.

⚠️ Prim's: not skipping already-added vertices With a min-heap, Prim's pushes duplicate entries (same vertex, different weights). When you pop an entry, check if that vertex is already in the tree. If yes, skip it. This is the same "lazy deletion" pattern as Dijkstra. Without this check, you add cycles to the "tree."

Operations & Complexity

Algorithm	Time	Space	Best For
Kruskal's	O(E log E)	O(V + E)	Sparse graphs, edge list input
Prim's (binary heap)	O(E log V)	O(V + E)	Dense graphs, adjacency list input
Prim's (adj matrix)	O(V²)	O(V²)	Very dense / complete graphs (E ≈ V²)
Borůvka's	O(E log V)	O(V + E)	Parallel implementations

Since E ≤ V² and log(V²) = 2 log V, the complexities of Kruskal's (O(E log E)) and Prim's with a heap (O(E log V)) are essentially the same in big-O terms. The practical difference comes down to input format, constant factors, and whether you already have Union-Find or a heap handy.

Borůvka's algorithm is less commonly taught but worth knowing: it finds the cheapest edge out of each component simultaneously, merges those components, and repeats. It's O(E log V) and naturally parallelizable — each component can find its cheapest edge independently.

Implementation

Union-Find (Foundation for Kruskal's)

Python

class UnionFind:
    """Disjoint Set Union with path compression and union by rank.
    
    Two optimizations that bring operations from O(n) to nearly O(1):
    
    Path compression: During find(), make every node point directly to the root.
    Next time, find() is O(1) for that node.
    
    Union by rank: When merging two trees, attach the shorter one under the
    taller one. Keeps the tree balanced, limiting height to O(log n).
    
    Together: O(α(n)) amortized per operation, where α is the inverse
    Ackermann function — effectively ≤ 4 for any practical input.
    """
    def __init__(self, n):
        self.parent = list(range(n))  # Each node is its own root initially
        self.rank = [0] * n           # Rank ≈ tree height (upper bound)

    def find(self, x):
        # Path compression: recursively point every node to the root
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        # Union by rank: attach shorter tree under taller tree
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False  # Already in same component — edge would create cycle
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx  # Ensure rx is the taller root
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1  # Tie: new root grows taller by 1
        return True

Kruskal's Algorithm

Python

def kruskal(V, edges):
    """Kruskal's MST algorithm.
    
    V: number of vertices (labeled 0 to V-1)
    edges: list of (weight, u, v) tuples
    
    Returns: (mst_edges, total_weight)
        mst_edges: list of (u, v, weight) tuples in the MST
        total_weight: sum of all MST edge weights
    """
    edges.sort()  # Sort by weight — the greedy choice
    uf = UnionFind(V)
    mst = []
    total = 0

    for w, u, v in edges:
        if uf.union(u, v):  # Only add if it doesn't create a cycle
            mst.append((u, v, w))
            total += w
            if len(mst) == V - 1:  # MST is complete — V-1 edges
                break

    # If len(mst) < V-1, graph is disconnected (no spanning tree exists)
    return mst, total

Prim's Algorithm

Python

import heapq

def prim(graph, start=0):
    """Prim's MST algorithm using a min-heap.
    
    graph: adjacency list {vertex: [(neighbor, weight), ...]}
    start: starting vertex (choice doesn't affect the MST)
    
    Returns: (mst_edges, total_weight)
    
    Nearly identical to Dijkstra — the only difference is what we put
    in the heap. Dijkstra: cumulative distance from source.
    Prim: just the edge weight (we want the cheapest connection, not
    the cheapest total path).
    """
    visited = set()
    mst = []
    total = 0
    # Heap entries: (weight, from_vertex, to_vertex)
    heap = [(0, -1, start)]  # Start with a dummy edge to the start vertex

    while heap and len(visited) < len(graph):
        w, frm, to = heapq.heappop(heap)
        if to in visited:
            continue  # Already in the tree — skip (lazy deletion)

        visited.add(to)
        if frm != -1:  # Skip the initial dummy edge
            mst.append((frm, to, w))
            total += w

        # Push all edges from the newly added vertex
        for neighbor, weight in graph[to]:
            if neighbor not in visited:
                heapq.heappush(heap, (weight, to, neighbor))

    return mst, total

JavaScript

function kruskal(V, edges) {
  // edges: [[weight, u, v], ...]
  edges.sort((a, b) => a[0] - b[0]);
  
  // Simple Union-Find
  const parent = Array.from({length: V}, (_, i) => i);
  const rank = new Array(V).fill(0);
  
  function find(x) {
    if (parent[x] !== x) parent[x] = find(parent[x]);
    return parent[x];
  }
  
  function union(x, y) {
    let rx = find(x), ry = find(y);
    if (rx === ry) return false;
    if (rank[rx] < rank[ry]) [rx, ry] = [ry, rx];
    parent[ry] = rx;
    if (rank[rx] === rank[ry]) rank[rx]++;
    return true;
  }

  const mst = [];
  let total = 0;
  for (const [w, u, v] of edges) {
    if (union(u, v)) {
      mst.push({u, v, weight: w});
      total += w;
      if (mst.length === V - 1) break;
    }
  }
  return { mst, total };
}

Real-World Example: Network Infrastructure Planning

Telecom companies face this exact problem: connect N cell towers with fiber optic cable, minimizing total cable length. Each possible connection has a cost based on distance, terrain, and permit requirements. The MST gives the cheapest network that connects everyone.

Python

import math

def plan_fiber_network(towers):
    """Plan minimum-cost fiber optic connections between cell towers.
    
    towers: list of (name, x, y) tuples — tower locations
    
    Uses Kruskal's on a complete graph where edge weight = distance.
    In a complete graph with N nodes, there are N*(N-1)/2 edges.
    
    Returns: list of connections to build and total cable length.
    """
    n = len(towers)
    
    # Generate all possible connections with their distances
    edges = []
    for i in range(n):
        for j in range(i + 1, n):
            name_i, xi, yi = towers[i]
            name_j, xj, yj = towers[j]
            # Euclidean distance = cable length needed
            dist = math.sqrt((xi - xj) ** 2 + (yi - yj) ** 2)
            edges.append((dist, i, j))
    
    # Run Kruskal's
    edges.sort()
    uf = UnionFind(n)
    connections = []
    total_cable = 0
    
    for dist, i, j in edges:
        if uf.union(i, j):
            connections.append({
                "from": towers[i][0],
                "to": towers[j][0],
                "cable_km": round(dist, 1)
            })
            total_cable += dist
            if len(connections) == n - 1:
                break
    
    return connections, round(total_cable, 1)

# Example: 6 cell towers in a city
towers = [
    ("Downtown",    0,   0),
    ("Airport",     12,  5),
    ("University",  3,   8),
    ("Hospital",    7,   2),
    ("Mall",        10,  9),
    ("Stadium",     5,   -3),
]

connections, total = plan_fiber_network(towers)
print(f"Total cable needed: {total} km")
print(f"Connections to build ({len(connections)}):")
for c in connections:
    print(f"  {c['from']:12s} ↔ {c['to']:12s}  ({c['cable_km']} km)")

# Without MST, connecting everyone directly would need 15 cables.
# MST uses only 5 cables (V-1) at minimum total distance.

Real network planners add constraints — redundancy requirements (at least 2 paths between critical sites), capacity limits, right-of-way costs. But the starting point is always the MST, with additional edges added for reliability.

Interview Patterns

💡 "Minimum cost to connect" = MST Whenever a problem asks for the minimum cost to connect all nodes/cities/computers, and edges have costs, it's an MST problem. LeetCode 1135 (Connecting Cities With Minimum Cost) is this pattern in its purest form. Same with 1584 (Min Cost to Connect All Points) — generate all pairwise edges using Manhattan distance, then run Kruskal's or Prim's.

💡 "Maximum edge in MST" = Minimax path The MST has a special property: the path between any two vertices in the MST minimizes the maximum edge weight among all possible paths in the original graph. This is the minimax path (or "bottleneck" shortest path). If a problem asks for the path where the worst (heaviest) edge is minimized, build the MST and find the path in it.

💡 Union-Find is your best friend Most MST interview problems are really Union-Find problems in disguise. If you master Union-Find with path compression and union by rank, Kruskal's is just "sort edges + merge components." Union-Find itself comes up in many other problems too: connected components, redundant connections, accounts merge, number of provinces. Learn it well — it has the best effort-to-reward ratio of any data structure.

💡 Virtual node trick for well/source problems Some MST problems have both "connection costs" and "individual costs" (like building a well vs connecting to an existing pipe). The trick: add a virtual node (node 0) connected to every real node with weight = the individual cost. Now run MST on the expanded graph. The virtual node's edges represent "build locally" choices, and regular edges represent "connect to neighbor" choices. The MST picks the cheapest mix. LeetCode 1168 (Optimize Water Distribution) uses this exactly.

💡 "Remove K-1 heaviest MST edges" = K clusters Build the MST, then remove the K-1 heaviest edges. You get K connected components — that's K clusters where intra-cluster distances are small and inter-cluster distances are large. This is single-linkage hierarchical clustering, and it's a valid approach for clustering problems in interviews.

Recognizing MST Problems

"Connect all nodes with minimum total cost" → direct MST
"Minimum cost to make the graph connected" → MST on the disconnected components
"Remove edges to disconnect" → think about cut edges in the MST
"Clustering into K groups" → build MST, remove K-1 heaviest edges
"Is this edge in some MST?" → force-include or force-exclude it, compare total weights
"Redundant connection" → the edge that creates a cycle in Union-Find

Practice Problems

1584. Min Cost to Connect All Points — Medium — Manhattan distances as edge weights. Pure MST on a complete graph. Good first MST problem.
1135. Connecting Cities With Minimum Cost — Medium — Textbook MST. Sort edges, union-find, done.
684. Redundant Connection — Medium — Find the edge that creates a cycle. Union-Find: the first edge where union() returns false is the answer.
1168. Optimize Water Distribution in a Village — Hard — MST with virtual node trick. Add a node 0 for wells, then MST gives minimum total cost.
1489. Critical and Pseudo-Critical MST Edges — Hard — Test each edge: force-include or force-exclude, see if MST weight changes.
1631. Path With Minimum Effort — Medium — Minimax path problem. MST approach, Dijkstra, or binary search + BFS all work.

Minimum Spanning Trees

On This Page

What Is an MST?

Key Properties

Real-World Uses

How It Works

Kruskal's Algorithm — Edge-Centric

Prim's Algorithm — Vertex-Centric

Kruskal's vs Prim's — When to Use Which

Why Greedy Works

Interactive Visualization

Common Mistakes

Operations & Complexity

Implementation

Union-Find (Foundation for Kruskal's)

Kruskal's Algorithm

Prim's Algorithm

Real-World Example: Network Infrastructure Planning

Interview Patterns

Recognizing MST Problems

Practice Problems