A tree where each path from root to leaf spells out a word. The data structure behind autocomplete, spell checkers, and IP routing tables. Pronounced "try," not "tree" β it comes from retrieval.
IntermediateA trie (pronounced "try") is a tree-shaped data structure designed for storing and retrieving strings efficiently. The name comes from the word "retrieval" β Edward Fredkin coined the term in 1960, though the structure itself was described by RenΓ© de la Briandais in 1959.
Each node in a trie represents a single character. The path from the root (the topmost node, which is empty) to any node spells out a prefix (the beginning portion of a string). Some nodes are marked as "end of word" to indicate a complete string ends there.
Think of it like a physical dictionary organized by spelling. The word "cat" and "car" share a path for "c" β "a", then branch: one path continues to "t", the other to "r". Words with common prefixes (shared beginnings) share the same nodes in the trie, which saves both space and lookup time.
Here's another analogy. Imagine a phone tree at a company. You call in, press 1 for sales, then 3 for enterprise, then 2 for pricing. Each button press narrows your destination. That's exactly how a trie search works β each character narrows the search space.
A hash set (a data structure that stores unique items and supports O(1) lookup using a hash function) can tell you "does this exact word exist?" in O(1) average time. But it can't do these things efficiently:
Tries emerged in the late 1950s when researchers needed fast string retrieval for early computing applications. The original motivation was building efficient dictionaries β literally, storing and looking up words. The structure turned out to be far more versatile than anyone expected. Today, tries (and their compressed variants) power everything from DNS resolution to genome analysis to the autocomplete in your phone's keyboard.
/home/user/docs and /home/user/pics share the /home/user/ prefix./api/users/:id and /api/users/:id/posts share the /api/users/ prefix in the routing trie.Each trie node has two components:
isEnd (or is_end): "does a complete word end at this node?" This is critical β without it, you can't distinguish between "app" being a word and "app" being just the prefix of "apple".The root node is empty β it doesn't represent any character. Think of it as the starting point. Every word begins its path from the root.
Walk character by character from the root. For each character:
After processing the last character, mark that node as isEnd = true.
Example: Inserting "app" and "apple". "app" creates nodes rootβaβpβp (marking the last 'p' as end-of-word). "apple" then walks down rootβaβpβp (all exist already), then creates lβe (marking 'e' as end-of-word). The two words share the first three nodes β no duplication.
Walk character by character from the root:
isEnd.isEnd is true, the word exists. If false, that string is only a prefix of some longer word (e.g., searching for "app" when only "apple" was inserted, and "app" wasn't marked as a word).Identical to search, but skip the isEnd check at the end. If you can walk the entire prefix without hitting a missing child, at least one word with that prefix exists in the trie. This is the operation that makes tries special.
Two steps: First, walk to the node at the end of the prefix (same as prefix search). Then, run a DFS from that node to collect all complete words below it. Each time you hit a node with isEnd = true, add the current path to your results.
Deletion is trickier than it looks. You can't just remove nodes blindly β they might be part of another word's path. The algorithm:
isEnd on the final node.Example: After inserting "app" and "apple", deleting "apple" unmarks the 'e' node and removes 'e' and 'l' (they have no children and aren't end-of-word for anything else). The 'p' node stays because it's the end-of-word for "app".
Type a word and insert it into the trie. Search for words β the matching prefix path lights up green for found, red for not found. Watch the tree grow as you add more words.
Let m = length of the word/key being operated on. Let n = total number of words in the trie.
| Operation | Time | Notes |
|---|---|---|
| Insert | O(m) | Walk/create m nodes β one per character |
| Search | O(m) | Walk m nodes, check isEnd flag at the end |
| startsWith (Prefix) | O(m) | Walk m nodes β no isEnd check needed |
| Delete | O(m) | Walk down + cleanup walk back up |
| Autocomplete (all words with prefix) | O(m + k) | m to reach prefix node, k = total chars in all results below it |
Space: O(n Γ m Γ alphabet_size) in the worst case β if no words share any prefixes, every character of every word gets its own node. But in practice, prefix sharing reduces this dramatically. For English text, tries use roughly 3-10x less space than you'd expect from the worst case.
httprouter) use radix trees for URL path matching.
class TrieNode:
"""A single node in the trie. Holds a map of children and an end-of-word flag."""
def __init__(self):
self.children = {} # char β TrieNode
self.is_end = False # True if a complete word ends at this node
# Optional: store a count of words passing through this node
# Useful for autocomplete ranking or counting prefix matches
self.prefix_count = 0
class Trie:
def __init__(self):
self.root = TrieNode()
self.word_count = 0 # Total words stored
def insert(self, word: str) -> None:
"""Insert a word into the trie. O(m) where m = len(word)."""
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.prefix_count += 1 # Track how many words pass through here
if not node.is_end: # Avoid double-counting duplicate inserts
node.is_end = True
self.word_count += 1
def search(self, word: str) -> bool:
"""Check if an exact word exists in the trie. O(m)."""
node = self._walk(word)
# Must reach the end AND the node must be marked as a complete word
return node is not None and node.is_end
def starts_with(self, prefix: str) -> bool:
"""Check if any word in the trie starts with this prefix. O(m)."""
# Just need to walk the prefix β no is_end check needed
return self._walk(prefix) is not None
def count_prefix(self, prefix: str) -> int:
"""Count how many words have this prefix. O(m)."""
node = self._walk(prefix)
return node.prefix_count if node else 0
def _walk(self, s: str):
"""Follow the path for string s from root. Returns final node or None."""
node = self.root
for ch in s:
if ch not in node.children:
return None # Path doesn't exist
node = node.children[ch]
return node
def autocomplete(self, prefix: str, limit: int = 10) -> list:
"""Return all words starting with prefix (up to limit). O(m + k)."""
node = self._walk(prefix)
if node is None:
return []
results = []
# DFS from the prefix node, collecting complete words
def _dfs(n, path):
if len(results) >= limit:
return # Early exit once we have enough
if n.is_end:
results.append(prefix + path)
# Visit children in sorted order for alphabetical results
for ch in sorted(n.children):
_dfs(n.children[ch], path + ch)
_dfs(node, "")
return results
def delete(self, word: str) -> bool:
"""Delete word from trie. Returns True if word existed.
Cleans up nodes that are no longer part of any word's path."""
def _delete(node, word, depth):
if depth == len(word):
if not node.is_end:
return False # Word doesn't exist β nothing to delete
node.is_end = False
self.word_count -= 1
# Safe to remove this node if it has no children
return len(node.children) == 0
ch = word[depth]
if ch not in node.children:
return False # Word doesn't exist
child = node.children[ch]
child.prefix_count -= 1
should_delete = _delete(child, word, depth + 1)
if should_delete:
del node.children[ch] # Remove the child node
# Propagate deletion: remove this node too if it's empty
# and not an end-of-word for some other word
return not node.is_end and len(node.children) == 0
return False
return _delete(self.root, word, 0)
# Usage
trie = Trie()
for word in ["apple", "app", "application", "apt", "banana", "band", "ban"]:
trie.insert(word)
print(trie.search("app")) # True
print(trie.search("appl")) # False (only a prefix, not a word)
print(trie.starts_with("app")) # True
print(trie.autocomplete("app")) # ['app', 'apple', 'application']
print(trie.autocomplete("ban")) # ['ban', 'banana', 'band']
print(trie.count_prefix("app")) # 3 (app, apple, application)
trie.delete("apple")
print(trie.search("apple")) # False (deleted)
print(trie.search("app")) # True (still exists)
class WildcardTrie(Trie):
"""Extends basic Trie to support '.' wildcard matching.
This is exactly LC 211 β Design Add and Search Words Data Structure."""
def search_with_wildcards(self, word: str) -> bool:
"""Search where '.' matches any single character. O(26^d * m) worst case
where d is the number of dots, but usually much faster."""
def _dfs(node, i):
# Base case: processed all characters
if i == len(word):
return node.is_end
ch = word[i]
if ch == '.':
# Wildcard: try ALL children
for child in node.children.values():
if _dfs(child, i + 1):
return True
return False
else:
# Normal character: follow the specific child
if ch not in node.children:
return False
return _dfs(node.children[ch], i + 1)
return _dfs(self.root, 0)
# Usage
wt = WildcardTrie()
wt.insert("cat")
wt.insert("car")
wt.insert("bat")
print(wt.search_with_wildcards("c.t")) # True (matches "cat")
print(wt.search_with_wildcards("c.r")) # True (matches "car")
print(wt.search_with_wildcards(".at")) # True (matches "cat" and "bat")
print(wt.search_with_wildcards("...")) # True (matches any 3-letter word)
print(wt.search_with_wildcards("c..s")) # False (no 4-letter word starting with c)
class TrieNode {
constructor() {
this.children = {}; // char β TrieNode
this.isEnd = false; // Does a complete word end here?
}
}
class Trie {
constructor() { this.root = new TrieNode(); }
insert(word) {
let node = this.root;
for (const ch of word) {
if (!node.children[ch]) node.children[ch] = new TrieNode();
node = node.children[ch];
}
node.isEnd = true;
}
search(word) {
const node = this._walk(word);
return node !== null && node.isEnd;
}
startsWith(prefix) {
return this._walk(prefix) !== null;
}
_walk(s) {
let node = this.root;
for (const ch of s) {
if (!node.children[ch]) return null;
node = node.children[ch];
}
return node;
}
autocomplete(prefix, limit = 10) {
const node = this._walk(prefix);
if (!node) return [];
const results = [];
const dfs = (n, path) => {
if (results.length >= limit) return;
if (n.isEnd) results.push(prefix + path);
// Sort keys for alphabetical results
for (const ch of Object.keys(n.children).sort()) {
dfs(n.children[ch], path + ch);
}
};
dfs(node, '');
return results;
}
delete(word) {
const _del = (node, depth) => {
if (depth === word.length) {
if (!node.isEnd) return false;
node.isEnd = false;
return Object.keys(node.children).length === 0;
}
const ch = word[depth];
if (!node.children[ch]) return false;
const shouldDelete = _del(node.children[ch], depth + 1);
if (shouldDelete) {
delete node.children[ch];
return !node.isEnd && Object.keys(node.children).length === 0;
}
return false;
};
return _del(this.root, 0);
}
}
search("app") returns false if only "apple" is in the trie (because the 'p' node at the end of "app" has isEnd = false). startsWith("app") returns true because the prefix path exists. This is the most common trie bug in interviews β forgetting to check isEnd in search.isEnd and child count before removing a node.isEnd = true. Make sure your implementation handles this gracefully.Here's a simplified version of what happens when you type into Google's search bar. The system maintains a trie of popular queries, with each word weighted by search frequency. As you type, it walks the trie and returns the most popular completions.
import heapq
class AutocompleteNode:
def __init__(self):
self.children = {}
self.is_end = False
self.frequency = 0 # How often this query was searched
# Top-k cache: store the best completions at each node
# This avoids DFS on every keystroke (real systems do this)
self.top_results = []
class AutocompleteEngine:
"""A search autocomplete system (simplified LC 642 variant).
In production, this would be distributed across many servers,
with tries sharded by first character or hash prefix. Google's
actual system handles ~8.5 billion searches per day.
"""
def __init__(self, k=5):
self.root = AutocompleteNode()
self.k = k # Max suggestions to return
def add_query(self, query: str, frequency: int = 1):
"""Record a search query with its frequency."""
node = self.root
for ch in query:
if ch not in node.children:
node.children[ch] = AutocompleteNode()
node = node.children[ch]
node.is_end = True
node.frequency += frequency
def suggest(self, prefix: str) -> list:
"""Return top-k suggestions for a prefix, ranked by frequency.
Returns list of (query, frequency) tuples, highest frequency first.
This is what fires on every keystroke in a search bar.
"""
# Walk to the prefix node
node = self.root
for ch in prefix:
if ch not in node.children:
return [] # No words with this prefix
node = node.children[ch]
# DFS to collect all completions with their frequencies
# Use a min-heap of size k to efficiently track the top results
heap = [] # min-heap of (frequency, query)
def _collect(n, path):
if n.is_end:
if len(heap) < self.k:
heapq.heappush(heap, (n.frequency, prefix + path))
elif n.frequency > heap[0][0]:
heapq.heapreplace(heap, (n.frequency, prefix + path))
for ch in n.children:
_collect(n.children[ch], path + ch)
_collect(node, "")
# Sort by frequency (descending), then alphabetically for ties
return sorted(heap, key=lambda x: (-x[0], x[1]))
# Simulate a search engine
engine = AutocompleteEngine(k=5)
# Load some search queries with their popularity
queries = [
("python tutorial", 50000), ("python download", 45000),
("python list", 30000), ("python dictionary", 28000),
("python for loop", 25000), ("python class", 22000),
("pytorch install", 15000), ("pythagorean theorem", 12000),
("pizza near me", 80000), ("pizza delivery", 60000),
]
for query, freq in queries:
engine.add_query(query, freq)
# User starts typing...
print("Typing 'py':")
for freq, query in engine.suggest("py"):
print(f" {query} ({freq:,} searches)")
print("\nTyping 'python ':")
for freq, query in engine.suggest("python "):
print(f" {query} ({freq:,} searches)")
print("\nTyping 'pi':")
for freq, query in engine.suggest("pi"):
print(f" {query} ({freq:,} searches)")
The real Google autocomplete is vastly more complex β it considers your search history, location, trending topics, and recency. But the underlying data structure is a trie (or a compressed trie variant), with caching and precomputation at every level.
isEnd node, that's your shortest match. Clean and efficient.
| Problem | Difficulty | Key Technique |
|---|---|---|
| LC 208 β Implement Trie (Prefix Tree) | Medium | Core trie implementation β insert, search, startsWith |
| LC 211 β Design Add and Search Words | Medium | Trie + DFS with '.' wildcard matching |
| LC 14 β Longest Common Prefix | Easy | Trie or vertical scanning (trie is overkill but good practice) |
| LC 648 β Replace Words | Medium | Trie for shortest prefix match |
| LC 1268 β Search Suggestions System | Medium | Trie-based autocomplete with sorted results |
| LC 212 β Word Search II | Hard | Trie + grid DFS backtracking with pruning |
| LC 336 β Palindrome Pairs | Hard | Trie of reversed words + palindrome suffix check |