A linear ordering of vertices in a directed acyclic graph (DAG) where every edge goes forward. It's how your build system decides what to compile first, and how your university decides which courses you can take.
IntermediateImagine you have a list of tasks, but some tasks depend on others. You can't start "Build Walls" until "Pour Foundation" is done. You can't paint until the walls are up. A topological sort gives you a valid order to do everything, respecting all dependencies.
More precisely: given a directed acyclic graph (DAG), a topological ordering is a linear sequence of all vertices such that for every directed edge u β v, vertex u appears before v in the sequence. The edge u β v means "u must come before v" β it's a dependency.
The term comes from topology in mathematics, where it relates to partial orders. But in practice, think of it as "dependency resolution" β figuring out a valid execution order when things depend on other things.
Two key things to note:
Topological sorting has been studied since the early days of computing. Kahn published his BFS-based algorithm in 1962. The DFS-based approach came from Tarjan's work on depth-first search in the early 1970s. Both remain the standard implementations today β they're simple, efficient, and hard to beat.
A DAG (Directed Acyclic Graph) is a directed graph with no cycles. Every tree is a DAG (parentβchild edges, no loops), but DAGs are more general β a node can have multiple parents (multiple things depending on it) and multiple children. Think of git commit history: commits can merge (multiple parents), but they never loop back in time.
If your graph has a cycle, topological sort will tell you β both algorithms below detect cycles as a side effect. This is actually one of the most useful features: "can I even complete all these tasks, or is there a circular dependency?"
Kahn's approach uses in-degrees β the number of edges pointing INTO each vertex. An in-degree of 0 means nothing depends on this vertex being done first β it has no prerequisites, so it's "ready."
The intuition: repeatedly remove vertices with no remaining dependencies. Each removal might "free up" other vertices whose last dependency was just satisfied. It's like peeling layers off an onion β or more precisely, like completing tasks that unlock new tasks.
Run DFS on the entire graph. When a vertex's DFS finishes (all its descendants have been fully explored), add it to a list. Reversing that list gives a valid topological order. This is called reverse post-order.
Why does this work? When vertex u finishes after all of u's descendants, it means everything u depends on (or points to) is already in the list. So when we reverse, u comes before all its dependents β exactly what topological order requires.
This shows Kahn's algorithm in action on a course prerequisite graph. Watch in-degree counts change as nodes with zero dependencies get removed and added to the output. The blue queue shows what's ready to process next.
| Algorithm | Time | Space | Detects Cycles? | Notes |
|---|---|---|---|---|
| Kahn's (BFS) | O(V + E) | O(V + E) | β | Naturally gives level-by-level ordering |
| DFS-based | O(V + E) | O(V + E) | β | Needs post-order reversal |
Both algorithms are O(V + E) β you visit every vertex and every edge exactly once. The V term comes from initializing/checking all vertices; the E term comes from scanning all edges (to count in-degrees in Kahn's, or to explore neighbors in DFS).
from collections import deque
def topological_sort_kahn(graph):
"""Kahn's algorithm β BFS-based topological sort.
graph: dict mapping vertex -> list of neighbors (adjacency list)
e.g. {"CS101": ["CS201", "CS202"], "MATH150": ["CS201"], ...}
Returns: list of vertices in topological order.
Raises: ValueError if the graph has a cycle.
Intuition: repeatedly remove vertices with no remaining dependencies.
Each removal may free up other vertices whose last prerequisite
was just completed.
"""
# Count how many edges point INTO each vertex (its "prerequisite count")
in_degree = {v: 0 for v in graph}
for u in graph:
for v in graph[u]:
in_degree[v] = in_degree.get(v, 0) + 1
# Start with all "free" vertices β they have no prerequisites
queue = deque(v for v in graph if in_degree[v] == 0)
result = []
while queue:
node = queue.popleft()
result.append(node)
# "Remove" this node from the graph:
# decrement in-degree of everything that depended on it
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor) # All prerequisites met!
if len(result) != len(graph):
raise ValueError("Graph has a cycle β topological sort impossible")
return result
function topologicalSortKahn(graph) {
// graph: { vertex: [neighbors], ... }
const inDegree = {};
for (const u of Object.keys(graph)) {
inDegree[u] = inDegree[u] || 0;
for (const v of graph[u]) {
inDegree[v] = (inDegree[v] || 0) + 1;
}
}
// Queue up everything with no prerequisites
const queue = Object.keys(graph).filter(v => inDegree[v] === 0);
const result = [];
while (queue.length) {
const node = queue.shift();
result.push(node);
for (const neighbor of graph[node]) {
inDegree[neighbor]--;
if (inDegree[neighbor] === 0) queue.push(neighbor);
}
}
if (result.length !== Object.keys(graph).length) {
throw new Error("Cycle detected β topological sort impossible");
}
return result;
}
def topological_sort_dfs(graph):
"""DFS-based topological sort with three-color cycle detection.
WHITE = unvisited β haven't touched this vertex yet
GRAY = in-progress β currently exploring this vertex's subtree
BLACK = done β finished exploring, all descendants processed
Key insight: if DFS hits a GRAY vertex, we've found a back edge,
which means a cycle. In a DAG, you only ever revisit BLACK vertices
(cross edges or forward edges), never GRAY ones.
"""
WHITE, GRAY, BLACK = 0, 1, 2
color = {v: WHITE for v in graph}
order = [] # Will be built in reverse topological order
def dfs(u):
color[u] = GRAY # "I'm currently being explored"
for v in graph[u]:
if color[v] == GRAY:
raise ValueError(f"Cycle detected: back edge {u} β {v}")
if color[v] == WHITE:
dfs(v)
color[u] = BLACK # "Done exploring all my descendants"
order.append(u) # Add AFTER all descendants β reverse post-order
for vertex in graph:
if color[vertex] == WHITE:
dfs(vertex)
order.reverse() # Reverse to get topological order (not reverse-topological)
return order
import heapq
def topological_sort_lex(graph):
"""Lexicographically smallest topological order.
Same as Kahn's, but use a min-heap instead of a queue.
When multiple vertices have in-degree 0 simultaneously,
pick the alphabetically/numerically smallest one.
Useful when the problem asks for "the smallest valid ordering"
or "alphabetical order among valid orderings."
"""
in_degree = {v: 0 for v in graph}
for u in graph:
for v in graph[u]:
in_degree[v] = in_degree.get(v, 0) + 1
# Min-heap instead of queue β always pick smallest available vertex
heap = [v for v in graph if in_degree[v] == 0]
heapq.heapify(heap)
result = []
while heap:
node = heapq.heappop(heap) # Smallest in-degree-0 vertex
result.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
heapq.heappush(heap, neighbor)
if len(result) != len(graph):
raise ValueError("Cycle detected")
return result
from collections import deque
def minimum_semesters(graph):
"""Find minimum number of parallel batches (semesters/rounds).
Same as Kahn's, but count how many "levels" we process.
Each level = one parallel batch. All courses in the same batch
can be taken simultaneously (their prerequisites are all done).
Returns: number of semesters, or -1 if impossible (cycle).
"""
in_degree = {v: 0 for v in graph}
for u in graph:
for v in graph[u]:
in_degree[v] = in_degree.get(v, 0) + 1
queue = deque(v for v in graph if in_degree[v] == 0)
semesters = 0
processed = 0
while queue:
semesters += 1
# Process ALL nodes at current level before moving on
level_size = len(queue)
for _ in range(level_size):
node = queue.popleft()
processed += 1
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return semesters if processed == len(graph) else -1
Every time you run make, npm run build, or gradle build, a topological sort decides the compilation order. Source files import other files, creating a dependency graph. The build system needs to compile dependencies before the files that use them.
Here's a simplified build system that resolves module dependencies:
import re
from pathlib import Path
from collections import deque
def parse_imports(filepath):
"""Extract import statements from a Python file."""
imports = []
try:
content = Path(filepath).read_text()
# Match "import X" and "from X import Y"
for match in re.finditer(r'(?:from|import)\s+([\w.]+)', content):
imports.append(match.group(1).split('.')[0]) # Top-level module
except FileNotFoundError:
pass
return imports
def build_order(source_files):
"""Determine compilation order using topological sort.
source_files: list of Python file paths
Returns: ordered list of files to compile, or raises on circular import.
This is what build systems do:
1. Scan each file for dependencies (imports)
2. Build a dependency graph
3. Topological sort β compilation order
"""
# Map module name to file path
name_to_file = {}
for f in source_files:
module = Path(f).stem # "utils.py" β "utils"
name_to_file[module] = f
# Build dependency graph: file β [files it depends on]
graph = {f: [] for f in source_files}
in_degree = {f: 0 for f in source_files}
for f in source_files:
module = Path(f).stem
for dep in parse_imports(f):
if dep in name_to_file:
dep_file = name_to_file[dep]
graph[dep_file].append(f) # dep must be compiled before f
in_degree[f] += 1
# Kahn's algorithm β determine safe compilation order
queue = deque(f for f in source_files if in_degree[f] == 0)
order = []
while queue:
f = queue.popleft()
order.append(f)
for dependent in graph[f]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append(dependent)
if len(order) != len(source_files):
# Find the cycle for a helpful error message
unprocessed = [f for f in source_files if f not in set(order)]
raise ValueError(
f"Circular dependency detected involving: {unprocessed}\n"
f"Fix: break the cycle by extracting shared code into a new module."
)
return order
# Example
files = ["config.py", "database.py", "models.py", "api.py", "main.py"]
# If main imports api, api imports models, models imports database,
# database imports config, then the build order is:
# config.py β database.py β models.py β api.py β main.py
Real build systems add caching (don't recompile if nothing changed), parallelism (compile independent modules simultaneously β that's the "level" counting from Kahn's), and incremental builds. But the core dependency resolution is always topological sort.
longest[v] = max(longest[v], longest[u] + weight(u,v)). This is critical for problems like "critical path" in project management.
Look for these signals in problem statements: