How much water can flow through a network of pipes? How many trucks can travel from factory to warehouse? Network flow answers these questions — and a surprising number of other problems that don't look like flow at all.
AdvancedA flow network is a directed graph where each edge has a capacity — the maximum amount of "stuff" it can carry. The stuff could be water, data packets, trucks, or an abstract unit. There's a designated source node (where flow originates) and a sink node (where flow ends up). The max flow problem asks: what's the maximum total flow you can push from source to sink without exceeding any edge's capacity?
Picture a system of water pipes. Each pipe has a maximum throughput. Water enters the system at a single source and exits at a single sink. You want to maximize total water delivery. Every intermediate junction (node) must conserve water — whatever flows in must flow out. That's the whole problem.
Network flow has been studied since the 1950s, originally motivated by Cold War military logistics. The Ford-Fulkerson method (1956) was developed by L.R. Ford Jr. and D.R. Fulkerson at the RAND Corporation to analyze Soviet railway capacity. The max-flow min-cut theorem they proved became one of the most important results in combinatorial optimization.
This is the fundamental theorem of network flow, and it's one of those rare results that's both profound and practical.
A cut is a partition of the vertices into two groups: S (containing the source) and T (containing the sink). The capacity of a cut is the sum of capacities of edges going from S to T (not T to S — direction matters). A minimum cut (min cut) is the cut with the smallest capacity.
The theorem states: max flow = min cut. The maximum amount of flow you can push through the network equals the capacity of the narrowest bottleneck (the minimum cut). This is elegant because it connects two seemingly different questions — an optimization problem (max flow) and a structural property (min cut) — and shows they have the same answer.
This means you get two results for the price of one: solving max flow also gives you the min cut (by looking at which nodes are reachable from the source in the final residual graph — the graph showing remaining capacity on each edge).
Here's the surprising part: many problems that don't look like flow at all can be reduced to max flow. This is what makes network flow so powerful.
The Ford-Fulkerson method is more of a framework than a specific algorithm — it describes what to do but leaves how to find augmenting paths unspecified.
This is the most confusing part of flow algorithms for beginners, so let's be precise.
For every original edge (u → v) with capacity c carrying flow f:
Why do we need reverse edges? Because early flow decisions might not be globally optimal. The algorithm might push flow along one path and later need to "redirect" some of that flow along a different path to achieve higher total flow. Reverse edges enable this redirection. Without them, the algorithm can get stuck at a suboptimal flow.
Concrete example: Consider a diamond graph: S→A (cap 1), S→B (cap 1), A→B (cap 1), A→T (cap 1), B→T (cap 1). Max flow is 2 (S→A→T and S→B→T). But if Ford-Fulkerson first finds the path S→A→B→T and pushes 1 unit, the only remaining augmenting path is S→B→reverse(B→A)→A→T — it "undoes" the A→B flow using the reverse edge. Without the reverse edge, it would stop at flow = 1 (suboptimal).
Ford-Fulkerson with DFS to find augmenting paths has a problem: the time complexity is O(E × max_flow), which can be terrible if capacities are large. Worse, with irrational capacities, DFS-based Ford-Fulkerson might not even terminate.
Edmonds-Karp (1972) fixes this by using BFS (breadth-first search) to find the shortest augmenting path (fewest edges). This guarantees:
Why does BFS help? By always finding the shortest path, Edmonds-Karp ensures that the shortest-path distance from source to sink in the residual graph can only increase over time. It can increase at most V times, and between consecutive increases, at most E augmentations happen. That's where O(VE) comes from.
Dinic's algorithm (1970, by Yefim Dinitz) improves on Edmonds-Karp by finding multiple augmenting paths per BFS. The algorithm:
Time: O(V²E) in general, but O(E√V) for unit-capacity graphs (which covers bipartite matching). In practice, Dinic's is much faster than Edmonds-Karp due to finding multiple augmenting paths per BFS phase.
After running any max-flow algorithm to completion:
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| Ford-Fulkerson (DFS) | O(E × max_flow) | O(V + E) | Depends on capacity values; may not terminate |
| Edmonds-Karp (BFS) | O(VE²) | O(V + E) | Polynomial, capacity-independent |
| Dinic's Algorithm | O(V²E) | O(V + E) | O(E√V) for unit capacities |
| Push-Relabel (FIFO) | O(V³) | O(V + E) | Best for very dense graphs |
| Hopcroft-Karp (matching) | O(E√V) | O(V + E) | Bipartite matching specifically |
| Min Cut (via Max Flow) | Same as max flow | BFS from source in final residual graph | |
| Min Cost Max Flow | O(V²E) to O(VE²) | O(V + E) | SPFA/Bellman-Ford based |
from collections import defaultdict, deque
def edmonds_karp(graph, source, sink):
"""Edmonds-Karp algorithm: BFS-based max flow.
graph: dict of dict. graph[u][v] = capacity of edge u→v.
Use defaultdict(lambda: defaultdict(int)) for easy construction.
Returns: maximum flow value.
Time: O(VE²), Space: O(V + E)."""
def bfs(source, sink, parent):
"""Find shortest augmenting path using BFS.
Returns True if a path exists, False otherwise.
parent dict records the path for later traversal."""
visited = {source}
queue = deque([source])
while queue:
u = queue.popleft()
for v in graph[u]:
if v not in visited and graph[u][v] > 0:
visited.add(v)
parent[v] = u
if v == sink:
return True # found a path
queue.append(v)
return False # sink unreachable
max_flow = 0
while True:
parent = {}
if not bfs(source, sink, parent):
break # no more augmenting paths — done
# Trace the path to find the bottleneck
path_flow = float('inf')
v = sink
while v != source:
u = parent[v]
path_flow = min(path_flow, graph[u][v])
v = u
# Update residual graph along the path
v = sink
while v != source:
u = parent[v]
graph[u][v] -= path_flow # decrease forward edge
graph[v][u] += path_flow # increase reverse edge
v = u
max_flow += path_flow
return max_flow
# Usage example:
# g = defaultdict(lambda: defaultdict(int))
# g['S']['A'] = 10; g['S']['B'] = 8
# g['A']['B'] = 5; g['A']['T'] = 7
# g['B']['T'] = 10
# print(edmonds_karp(g, 'S', 'T')) # max flow value
from collections import deque
class Dinic:
"""Dinic's max flow algorithm.
Faster than Edmonds-Karp in practice.
O(V²E) general, O(E√V) for unit-capacity graphs.
Use add_edge to build the graph, then call max_flow."""
def __init__(self, n):
self.n = n
self.graph = [[] for _ in range(n)] # adjacency list of edge indices
self.edges = [] # list of [to, capacity, rev_index]
def add_edge(self, u, v, cap):
"""Add directed edge u→v with capacity cap."""
# Forward edge
self.graph[u].append(len(self.edges))
self.edges.append([v, cap, len(self.edges) + 1])
# Reverse edge (capacity 0 initially)
self.graph[v].append(len(self.edges))
self.edges.append([u, 0, len(self.edges) - 1])
def bfs(self, s, t):
"""Build level graph using BFS. Returns True if t is reachable."""
self.level = [-1] * self.n
self.level[s] = 0
queue = deque([s])
while queue:
u = queue.popleft()
for idx in self.graph[u]:
v, cap, _ = self.edges[idx]
if self.level[v] == -1 and cap > 0:
self.level[v] = self.level[u] + 1
queue.append(v)
return self.level[t] != -1
def dfs(self, u, t, pushed):
"""Find blocking flow using DFS with current-edge optimization."""
if u == t:
return pushed
while self.iter[u] < len(self.graph[u]):
idx = self.graph[u][self.iter[u]]
v, cap, rev = self.edges[idx]
if self.level[v] == self.level[u] + 1 and cap > 0:
d = self.dfs(v, t, min(pushed, cap))
if d > 0:
self.edges[idx][1] -= d # decrease forward
self.edges[rev][1] += d # increase reverse
return d
self.iter[u] += 1
return 0
def max_flow(self, s, t):
"""Compute max flow from s to t."""
flow = 0
while self.bfs(s, t):
self.iter = [0] * self.n # current edge pointer per node
while True:
f = self.dfs(s, t, float('inf'))
if f == 0:
break
flow += f
return flow
# Usage:
# d = Dinic(6) # 6 nodes: 0=source, 5=sink
# d.add_edge(0, 1, 10)
# d.add_edge(0, 2, 8)
# d.add_edge(1, 2, 5)
# d.add_edge(1, 5, 7)
# d.add_edge(2, 5, 10)
# print(d.max_flow(0, 5))
function edmondsKarp(capacity, source, sink, n) {
// capacity: 2D array n×n, capacity[u][v] = edge capacity
// Uses residual capacity directly (no separate flow array)
const residual = capacity.map(row => [...row]);
let maxFlow = 0;
while (true) {
// BFS to find shortest augmenting path
const parent = new Array(n).fill(-1);
parent[source] = source;
const queue = [source];
let head = 0;
while (head < queue.length && parent[sink] === -1) {
const u = queue[head++];
for (let v = 0; v < n; v++) {
if (parent[v] === -1 && residual[u][v] > 0) {
parent[v] = u;
queue.push(v);
}
}
}
if (parent[sink] === -1) break; // no more paths
// Find bottleneck along the path
let aug = Infinity;
for (let v = sink; v !== source; v = parent[v])
aug = Math.min(aug, residual[parent[v]][v]);
// Update residual capacities
for (let v = sink; v !== source; v = parent[v]) {
residual[parent[v]][v] -= aug;
residual[v][parent[v]] += aug;
}
maxFlow += aug;
}
return maxFlow;
}
def find_min_cut(graph, source):
"""After running Edmonds-Karp (which modifies graph to residual),
find the min cut by BFS from source in the residual graph.
Vertices reachable from source = set S.
Vertices not reachable = set T.
Cut edges: all original edges from S to T.
Returns (S, T) partition."""
visited = set()
queue = deque([source])
visited.add(source)
while queue:
u = queue.popleft()
for v in graph[u]:
if v not in visited and graph[u][v] > 0:
visited.add(v)
queue.append(v)
S = visited
T = set(graph.keys()) - S
# The cut edges are original edges from S to T
# (they have residual capacity 0 in forward direction)
return S, T
def max_bipartite_matching(left_nodes, right_nodes, edges):
"""Reduce maximum bipartite matching to max flow.
left_nodes: list of left-side node ids
right_nodes: list of right-side node ids
edges: list of (left, right) pairs representing valid assignments
Returns the maximum number of matched pairs."""
graph = defaultdict(lambda: defaultdict(int))
SOURCE = '__source__'
SINK = '__sink__'
# Source → each left node (capacity 1: each worker gets one job)
for l in left_nodes:
graph[SOURCE][l] = 1
# Each right node → sink (capacity 1: each job gets one worker)
for r in right_nodes:
graph[r][SINK] = 1
# Left → right edges (capacity 1)
for l, r in edges:
graph[l][r] = 1
return edmonds_karp(graph, SOURCE, SINK)
# Example: 3 workers, 3 jobs
# Worker A can do jobs 1, 2
# Worker B can do job 2
# Worker C can do jobs 2, 3
# matching = max_bipartite_matching(
# ['A', 'B', 'C'], ['1', '2', '3'],
# [('A','1'), ('A','2'), ('B','2'), ('C','2'), ('C','3')]
# )
# Result: 3 (A→1, B→2, C→3)
Network flow models server-to-task assignment in data centers. Each server has a capacity (max concurrent requests). Each task type has a demand (how many instances need to run). Some servers can handle certain task types. How do you assign tasks to servers to maximize total throughput?
def balance_load(servers, tasks, compatibility):
"""Assign tasks to servers to maximize throughput.
servers: dict of server_name -> capacity (max concurrent tasks)
tasks: dict of task_type -> demand (how many instances needed)
compatibility: dict of server_name -> [list of task types it can run]
Returns: max tasks that can be assigned, and the assignment."""
graph = defaultdict(lambda: defaultdict(int))
SOURCE = '__source__'
SINK = '__sink__'
# Source → each task type (capacity = demand)
for task, demand in tasks.items():
graph[SOURCE][f'task_{task}'] = demand
# Each server → sink (capacity = server's max capacity)
for server, cap in servers.items():
graph[f'server_{server}'][SINK] = cap
# Task → compatible servers (capacity = min of demand and server cap)
for server, compatible_tasks in compatibility.items():
for task in compatible_tasks:
if task in tasks:
# Capacity: how many of this task the server could handle
graph[f'task_{task}'][f'server_{server}'] = min(
tasks[task], servers[server]
)
max_assigned = edmonds_karp(graph, SOURCE, SINK)
total_demand = sum(tasks.values())
print(f"Assigned {max_assigned}/{total_demand} tasks")
if max_assigned < total_demand:
print(f"⚠️ {total_demand - max_assigned} tasks unassigned (need more capacity)")
return max_assigned
# Example: 3 servers, 3 task types
servers = {'web-1': 10, 'web-2': 8, 'gpu-1': 4}
tasks = {'api': 12, 'ml-inference': 4, 'batch': 6}
compat = {
'web-1': ['api', 'batch'],
'web-2': ['api', 'batch'],
'gpu-1': ['ml-inference', 'batch']
}
balance_load(servers, tasks, compat)
# Assigned 22/22 tasks: web-1 handles 10 api, web-2 handles 2 api + 6 batch, gpu-1 handles 4 ml-inference
This same formulation is used by Kubernetes schedulers, CDN load balancers, and cloud platforms. The "servers" might be pods, edge nodes, or availability zones. The "tasks" might be containers, user requests, or data shards.
| # | Problem | Difficulty | Key Concept |
|---|---|---|---|
| 785 | Is Graph Bipartite? | Medium | Bipartite check (flow prereq) |
| 1494 | Parallel Courses II | Hard | Flow-like scheduling |
| 1349 | Maximum Students Taking Exam | Hard | Bipartite matching / flow |
| 1066 | Campus Bikes II | Medium | Assignment / matching |
| 2850 | Minimum Moves to Spread Stones | Medium | Min cost flow / matching |
| 1595 | Min Cost to Connect Two Groups | Hard | Min cost bipartite matching |
| 1947 | Max Compatibility Score Sum | Medium | Assignment (bitmask DP or flow) |