DSA Lab | Topological Sort

What Is Topological Sort?

Imagine you have a list of tasks, but some tasks depend on others. You can't start "Build Walls" until "Pour Foundation" is done. You can't paint until the walls are up. A topological sort gives you a valid order to do everything, respecting all dependencies.

More precisely: given a directed acyclic graph (DAG), a topological ordering is a linear sequence of all vertices such that for every directed edge u → v, vertex u appears before v in the sequence. The edge u → v means "u must come before v" — it's a dependency.

The term comes from topology in mathematics, where it relates to partial orders. But in practice, think of it as "dependency resolution" — figuring out a valid execution order when things depend on other things.

Two key things to note:

Only DAGs have topological orderings. A DAG is a Directed Acyclic Graph — a graph where all edges have a direction (A→B is different from B→A) and there are no cycles (no way to follow edges and end up back where you started). If there's a cycle (A depends on B, B depends on C, C depends on A), no valid ordering exists. Period. It's a circular dependency.
The ordering isn't unique. A graph can have many valid topological orderings. If A and B are independent (no edges between them), either can come first. The algorithm gives you one valid ordering, not THE ordering.

A Quick History

Topological sorting has been studied since the early days of computing. Kahn published his BFS-based algorithm in 1962. The DFS-based approach came from Tarjan's work on depth-first search in the early 1970s. Both remain the standard implementations today — they're simple, efficient, and hard to beat.

Where It Shows Up

Build systems — Make, Webpack, Gradle, Bazel all resolve dependency order with topological sort. Source file A imports B? Compile B first.
Package managers — npm, pip, apt, brew figure out install order this way. Package X depends on Y and Z? Install Y and Z before X.
Course scheduling — "CS201 requires CS101 and MATH150" is a DAG. The registrar's system uses topological sort to validate your plan.
Spreadsheet cell evaluation — Cell A1 references B2 which references C3. Cells form a dependency graph; evaluation order is topological.
Task scheduling / CI pipelines — GitHub Actions, Jenkins, Airflow all model jobs as DAGs and run them in topological order.
Database migration ordering — Migration 003 depends on 002 depends on 001. The migration runner topologically sorts them.
Compilation order — A compiler's dependency graph of translation units needs topological ordering.

DAG: The Prerequisite

A DAG (Directed Acyclic Graph) is a directed graph with no cycles. Every tree is a DAG (parent→child edges, no loops), but DAGs are more general — a node can have multiple parents (multiple things depending on it) and multiple children. Think of git commit history: commits can merge (multiple parents), but they never loop back in time.

If your graph has a cycle, topological sort will tell you — both algorithms below detect cycles as a side effect. This is actually one of the most useful features: "can I even complete all these tasks, or is there a circular dependency?"

How It Works

Kahn's Algorithm (BFS-based)

Kahn's approach uses in-degrees — the number of edges pointing INTO each vertex. An in-degree of 0 means nothing depends on this vertex being done first — it has no prerequisites, so it's "ready."

Count the in-degree of every vertex
Put all vertices with in-degree 0 into a queue (they have no dependencies — they're ready to go)
While the queue isn't empty:
- Dequeue a vertex, add it to the result
- For each of its neighbors: decrement their in-degree (we've "satisfied" this dependency)
- If any neighbor's in-degree drops to 0, all its dependencies are met — enqueue it
If the result has fewer vertices than the graph, the remaining vertices form a cycle — topological sort is impossible

The intuition: repeatedly remove vertices with no remaining dependencies. Each removal might "free up" other vertices whose last dependency was just satisfied. It's like peeling layers off an onion — or more precisely, like completing tasks that unlock new tasks.

💡 Cycle detection comes free If Kahn's algorithm finishes but hasn't processed all vertices, the unprocessed ones form a cycle. They'll always have in-degree > 0 because they depend on each other in a loop — nobody's in-degree can ever reach 0. The algorithm naturally tells you "this is impossible" without any extra work.

DFS-based Topological Sort

Run DFS on the entire graph. When a vertex's DFS finishes (all its descendants have been fully explored), add it to a list. Reversing that list gives a valid topological order. This is called reverse post-order.

Why does this work? When vertex u finishes after all of u's descendants, it means everything u depends on (or points to) is already in the list. So when we reverse, u comes before all its dependents — exactly what topological order requires.

Mark all vertices as unvisited
For each unvisited vertex, run DFS
In the DFS, after exploring all of a vertex's neighbors (its children/dependents), add the vertex to a stack
Pop the entire stack to get the topological order (or just reverse the list)

⚠️ Cycle detection requires three states Track three states: unvisited, in-progress, and done. If DFS reaches a vertex that's in-progress (still on the current call stack, meaning we followed edges from it but haven't finished yet), you've found a back edge — that's a cycle. The graph isn't a DAG. A simple two-state (visited/unvisited) check doesn't catch this correctly.

Interactive Visualization

This shows Kahn's algorithm in action on a course prerequisite graph. Watch in-degree counts change as nodes with zero dependencies get removed and added to the output. The blue queue shows what's ready to process next.

Kahn's Algorithm — Course Prerequisites

Common Mistakes

⚠️ Using topological sort on undirected graphs Topological sort only applies to directed graphs. An undirected edge between A and B means "A before B" AND "B before A" — that's a contradiction (a cycle of length 2). If you have an undirected graph, it doesn't have a topological ordering. Make sure your graph is directed before attempting topo sort.

⚠️ Forgetting to handle nodes with no edges Isolated nodes (no incoming or outgoing edges) have in-degree 0 and should appear in the topological order. If you only add nodes to your adjacency list when they appear as an edge endpoint, isolated nodes get silently dropped. Make sure every node is in your in-degree map, even if its count is 0.

⚠️ Using two-state visited for cycle detection in DFS A simple visited/unvisited boolean only tells you if you've seen a node before — not whether it's currently on your DFS path. In a DAG, you'll visit nodes multiple times via different paths, and a simple visited check gives false "cycle" reports. Use three states: white (unvisited), gray (in-progress), black (done). Only gray→gray is a cycle.

⚠️ Forgetting to reverse in DFS-based topo sort DFS produces nodes in reverse topological order (dependencies come LAST in the finish order). You need to reverse the result, or build it with a stack (push on finish, then pop). Forgetting this gives you the exact opposite of what you want.

Operations & Complexity

Algorithm	Time	Space	Detects Cycles?	Notes
Kahn's (BFS)	O(V + E)	O(V + E)	✓	Naturally gives level-by-level ordering
DFS-based	O(V + E)	O(V + E)	✓	Needs post-order reversal

Both algorithms are O(V + E) — you visit every vertex and every edge exactly once. The V term comes from initializing/checking all vertices; the E term comes from scanning all edges (to count in-degrees in Kahn's, or to explore neighbors in DFS).

Kahn's vs DFS — Picking the Right One

Need parallel scheduling? Kahn's. All nodes dequeued in the same "wave" (before any new in-degree-0 nodes are added) can run simultaneously. This tells you the minimum number of parallel batches.
Need cycle detection with the actual cycle? Both work. DFS is slightly more natural for tracing back the cycle path via the gray/in-progress chain.
Need lexicographically smallest order? Kahn's with a min-heap (priority queue) instead of a regular queue. Pop the smallest available vertex each time.
Already doing DFS for something else? Add topological sort as a side effect — just record finish times and reverse.
Iterative preferred? Kahn's is naturally iterative. DFS-based usually uses recursion (though it can be made iterative with an explicit stack).

Implementation

Kahn's Algorithm (BFS)

Python

from collections import deque

def topological_sort_kahn(graph):
    """Kahn's algorithm — BFS-based topological sort.
    
    graph: dict mapping vertex -> list of neighbors (adjacency list)
           e.g. {"CS101": ["CS201", "CS202"], "MATH150": ["CS201"], ...}
    
    Returns: list of vertices in topological order.
    Raises: ValueError if the graph has a cycle.
    
    Intuition: repeatedly remove vertices with no remaining dependencies.
    Each removal may free up other vertices whose last prerequisite 
    was just completed.
    """
    # Count how many edges point INTO each vertex (its "prerequisite count")
    in_degree = {v: 0 for v in graph}
    for u in graph:
        for v in graph[u]:
            in_degree[v] = in_degree.get(v, 0) + 1

    # Start with all "free" vertices — they have no prerequisites
    queue = deque(v for v in graph if in_degree[v] == 0)
    result = []

    while queue:
        node = queue.popleft()
        result.append(node)

        # "Remove" this node from the graph:
        # decrement in-degree of everything that depended on it
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)  # All prerequisites met!

    if len(result) != len(graph):
        raise ValueError("Graph has a cycle — topological sort impossible")

    return result

JavaScript

function topologicalSortKahn(graph) {
  // graph: { vertex: [neighbors], ... }
  const inDegree = {};
  for (const u of Object.keys(graph)) {
    inDegree[u] = inDegree[u] || 0;
    for (const v of graph[u]) {
      inDegree[v] = (inDegree[v] || 0) + 1;
    }
  }

  // Queue up everything with no prerequisites
  const queue = Object.keys(graph).filter(v => inDegree[v] === 0);
  const result = [];

  while (queue.length) {
    const node = queue.shift();
    result.push(node);
    for (const neighbor of graph[node]) {
      inDegree[neighbor]--;
      if (inDegree[neighbor] === 0) queue.push(neighbor);
    }
  }

  if (result.length !== Object.keys(graph).length) {
    throw new Error("Cycle detected — topological sort impossible");
  }
  return result;
}

DFS-based Topological Sort

Python

def topological_sort_dfs(graph):
    """DFS-based topological sort with three-color cycle detection.
    
    WHITE = unvisited — haven't touched this vertex yet
    GRAY  = in-progress — currently exploring this vertex's subtree
    BLACK = done — finished exploring, all descendants processed
    
    Key insight: if DFS hits a GRAY vertex, we've found a back edge,
    which means a cycle. In a DAG, you only ever revisit BLACK vertices
    (cross edges or forward edges), never GRAY ones.
    """
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {v: WHITE for v in graph}
    order = []  # Will be built in reverse topological order

    def dfs(u):
        color[u] = GRAY  # "I'm currently being explored"
        for v in graph[u]:
            if color[v] == GRAY:
                raise ValueError(f"Cycle detected: back edge {u} → {v}")
            if color[v] == WHITE:
                dfs(v)
        color[u] = BLACK  # "Done exploring all my descendants"
        order.append(u)   # Add AFTER all descendants — reverse post-order

    for vertex in graph:
        if color[vertex] == WHITE:
            dfs(vertex)

    order.reverse()  # Reverse to get topological order (not reverse-topological)
    return order

Kahn's with Lexicographic Ordering

Python

import heapq

def topological_sort_lex(graph):
    """Lexicographically smallest topological order.
    
    Same as Kahn's, but use a min-heap instead of a queue.
    When multiple vertices have in-degree 0 simultaneously,
    pick the alphabetically/numerically smallest one.
    
    Useful when the problem asks for "the smallest valid ordering"
    or "alphabetical order among valid orderings."
    """
    in_degree = {v: 0 for v in graph}
    for u in graph:
        for v in graph[u]:
            in_degree[v] = in_degree.get(v, 0) + 1

    # Min-heap instead of queue — always pick smallest available vertex
    heap = [v for v in graph if in_degree[v] == 0]
    heapq.heapify(heap)
    result = []

    while heap:
        node = heapq.heappop(heap)  # Smallest in-degree-0 vertex
        result.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                heapq.heappush(heap, neighbor)

    if len(result) != len(graph):
        raise ValueError("Cycle detected")
    return result

Parallel Scheduling (Minimum Semesters)

Python

from collections import deque

def minimum_semesters(graph):
    """Find minimum number of parallel batches (semesters/rounds).
    
    Same as Kahn's, but count how many "levels" we process.
    Each level = one parallel batch. All courses in the same batch
    can be taken simultaneously (their prerequisites are all done).
    
    Returns: number of semesters, or -1 if impossible (cycle).
    """
    in_degree = {v: 0 for v in graph}
    for u in graph:
        for v in graph[u]:
            in_degree[v] = in_degree.get(v, 0) + 1

    queue = deque(v for v in graph if in_degree[v] == 0)
    semesters = 0
    processed = 0

    while queue:
        semesters += 1
        # Process ALL nodes at current level before moving on
        level_size = len(queue)
        for _ in range(level_size):
            node = queue.popleft()
            processed += 1
            for neighbor in graph[node]:
                in_degree[neighbor] -= 1
                if in_degree[neighbor] == 0:
                    queue.append(neighbor)

    return semesters if processed == len(graph) else -1

Real-World Example: Build System

Every time you run make, npm run build, or gradle build, a topological sort decides the compilation order. Source files import other files, creating a dependency graph. The build system needs to compile dependencies before the files that use them.

Here's a simplified build system that resolves module dependencies:

Python

import re
from pathlib import Path
from collections import deque

def parse_imports(filepath):
    """Extract import statements from a Python file."""
    imports = []
    try:
        content = Path(filepath).read_text()
        # Match "import X" and "from X import Y"
        for match in re.finditer(r'(?:from|import)\s+([\w.]+)', content):
            imports.append(match.group(1).split('.')[0])  # Top-level module
    except FileNotFoundError:
        pass
    return imports

def build_order(source_files):
    """Determine compilation order using topological sort.
    
    source_files: list of Python file paths
    Returns: ordered list of files to compile, or raises on circular import.
    
    This is what build systems do:
    1. Scan each file for dependencies (imports)
    2. Build a dependency graph
    3. Topological sort → compilation order
    """
    # Map module name to file path
    name_to_file = {}
    for f in source_files:
        module = Path(f).stem  # "utils.py" → "utils"
        name_to_file[module] = f

    # Build dependency graph: file → [files it depends on]
    graph = {f: [] for f in source_files}
    in_degree = {f: 0 for f in source_files}

    for f in source_files:
        module = Path(f).stem
        for dep in parse_imports(f):
            if dep in name_to_file:
                dep_file = name_to_file[dep]
                graph[dep_file].append(f)  # dep must be compiled before f
                in_degree[f] += 1

    # Kahn's algorithm — determine safe compilation order
    queue = deque(f for f in source_files if in_degree[f] == 0)
    order = []

    while queue:
        f = queue.popleft()
        order.append(f)
        for dependent in graph[f]:
            in_degree[dependent] -= 1
            if in_degree[dependent] == 0:
                queue.append(dependent)

    if len(order) != len(source_files):
        # Find the cycle for a helpful error message
        unprocessed = [f for f in source_files if f not in set(order)]
        raise ValueError(
            f"Circular dependency detected involving: {unprocessed}\n"
            f"Fix: break the cycle by extracting shared code into a new module."
        )

    return order

# Example
files = ["config.py", "database.py", "models.py", "api.py", "main.py"]
# If main imports api, api imports models, models imports database,
# database imports config, then the build order is:
# config.py → database.py → models.py → api.py → main.py

Real build systems add caching (don't recompile if nothing changed), parallelism (compile independent modules simultaneously — that's the "level" counting from Kahn's), and incremental builds. But the core dependency resolution is always topological sort.

Interview Patterns

💡 "Can I take all courses?" = Cycle Detection in a DAG LeetCode 207 (Course Schedule) is the gateway topological sort problem. You're given prerequisite pairs and asked if it's possible to finish all courses. Translation: does the dependency graph have a cycle? If yes, impossible. If no, a valid ordering exists. Kahn's algorithm answers this directly — if you can't dequeue all nodes, there's a cycle.

💡 "In what order?" = Return the Actual Topological Order LeetCode 210 (Course Schedule II) takes it one step further: give me a valid ordering, not just yes/no. Same algorithm, but return the result list instead of just checking its length. This is the second problem you should solve after learning topo sort.

💡 Parallel scheduling = BFS level counting "What's the minimum number of semesters/rounds/steps to complete everything?" That's Kahn's algorithm where you process one full level at a time (all nodes with in-degree 0 at that moment). Count the levels — each level is one parallel batch. This is a favorite at Amazon and Google.

💡 Alien Dictionary = Build graph from comparisons, then topo-sort LeetCode 269 (Alien Dictionary): given a sorted list of words in an alien language, determine the character ordering. Compare adjacent words character by character to find ordering constraints (the first differing character gives you an edge). Build a graph from all these constraints, then topological sort it. If there's a cycle, the input is invalid.

💡 Longest path in a DAG = Topo sort + DP You can't find the longest path in a general graph efficiently (it's NP-hard), but in a DAG you can — in O(V+E). Process vertices in topological order. For each vertex, update the longest distance to each of its neighbors: longest[v] = max(longest[v], longest[u] + weight(u,v)). This is critical for problems like "critical path" in project management.

Recognizing Topological Sort Problems

Look for these signals in problem statements:

"Prerequisites" or "dependencies" → DAG → topological sort
"Order of operations" where some must happen before others
"Is it possible to finish/complete all...?" → cycle detection on a directed graph
"Minimum number of rounds/steps" → Kahn's with level counting
"Lexicographically smallest valid order" → Kahn's with a min-heap
Alien dictionary, word ordering → build graph from comparisons, then topo-sort
"Longest path" in a DAG → topological order + DP

Practice Problems

207. Course Schedule — Medium — The classic. Can you finish all courses given prerequisites? Cycle detection on a directed graph.
210. Course Schedule II — Medium — Same graph, but return a valid course ordering. Direct topological sort application.
269. Alien Dictionary — Hard — Derive character ordering from sorted alien words. Build a graph from adjacent word comparisons, then topological sort.
1136. Parallel Courses — Medium — Minimum semesters to take all courses. Kahn's with level counting.
2115. Find All Possible Recipes — Medium — Recipes with ingredient dependencies. Topological sort on the dependency graph.
310. Minimum Height Trees — Medium — Peel leaves layer by layer (similar to Kahn's applied to undirected graphs).
1203. Sort Items by Groups Respecting Dependencies — Hard — Two-level topological sort: sort groups, then sort items within each group. Hard but elegant.

Topological Sort

On This Page

What Is Topological Sort?

A Quick History

Where It Shows Up

DAG: The Prerequisite

How It Works

Kahn's Algorithm (BFS-based)

DFS-based Topological Sort

Interactive Visualization

Common Mistakes

Operations & Complexity

Kahn's vs DFS — Picking the Right One

Implementation

Kahn's Algorithm (BFS)

DFS-based Topological Sort

Kahn's with Lexicographic Ordering

Parallel Scheduling (Minimum Semesters)

Real-World Example: Build System

Interview Patterns

Recognizing Topological Sort Problems

Practice Problems