The Most Overlooked Data Structures You Need to Know
In the vast world of computer science and programming, data structures form the backbone of efficient algorithms and well-designed software. While most developers are familiar with common data structures like arrays, linked lists, and binary trees, there are several lesser-known yet powerful data structures that often go overlooked. In this comprehensive guide, we’ll explore these underappreciated gems and discuss why they deserve a place in your programming toolkit.
1. Bloom Filters
Bloom filters are probabilistic data structures used to test whether an element is a member of a set. They’re incredibly space-efficient and offer constant-time add and lookup operations.
How Bloom Filters Work
A Bloom filter consists of a bit array of m bits, initially all set to 0, and k different hash functions. To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.
To query for an element, feed it to each of the k hash functions to get k array positions. If any of the bits at these positions is 0, the element is definitely not in the set. If all are 1, then either the element is in the set, or the bits have been set to 1 during the insertion of other elements.
Use Cases
- Spell checkers
- Cache filtering
- Network routers
- Cryptocurrency wallets
Code Example
import mmh3
import math
import array
class BloomFilter:
def __init__(self, size, hash_count):
self.size = size
self.hash_count = hash_count
self.bit_array = array.array('b', [0] * size)
def add(self, item):
for seed in range(self.hash_count):
index = mmh3.hash(item, seed) % self.size
self.bit_array[index] = 1
def check(self, item):
for seed in range(self.hash_count):
index = mmh3.hash(item, seed) % self.size
if self.bit_array[index] == 0:
return False
return True
# Usage
bf = BloomFilter(1000000, 5)
bf.add("hello")
print(bf.check("hello")) # True
print(bf.check("world")) # False
2. Skip Lists
Skip lists are a probabilistic alternative to balanced trees. They allow for fast search, insertion, and deletion of elements, with an expected time complexity of O(log n) for these operations.
How Skip Lists Work
A skip list is essentially a series of linked lists stacked on top of each other. The bottom layer contains all elements, and each higher layer acts as an “express lane” for the lists below, allowing for faster traversal.
Use Cases
- Implementing sorted sets in Redis
- File systems
- In-memory databases
Code Example
import random
class Node:
def __init__(self, key, level):
self.key = key
self.forward = [None] * (level + 1)
class SkipList:
def __init__(self, max_level, p):
self.max_level = max_level
self.p = p
self.header = Node(-1, max_level)
self.level = 0
def random_level(self):
lvl = 0
while random.random() < self.p and lvl < self.max_level:
lvl += 1
return lvl
def insert(self, key):
update = [None] * (self.max_level + 1)
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and current.forward[i].key < key:
current = current.forward[i]
update[i] = current
level = self.random_level()
if level > self.level:
for i in range(self.level + 1, level + 1):
update[i] = self.header
self.level = level
new_node = Node(key, level)
for i in range(level + 1):
new_node.forward[i] = update[i].forward[i]
update[i].forward[i] = new_node
def search(self, key):
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and current.forward[i].key < key:
current = current.forward[i]
current = current.forward[0]
if current and current.key == key:
return current.key
return None
# Usage
sl = SkipList(4, 0.5)
sl.insert(3)
sl.insert(6)
sl.insert(7)
sl.insert(9)
print(sl.search(6)) # 6
print(sl.search(5)) # None
3. Trie (Prefix Tree)
A trie, also called a prefix tree, is an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings.
How Tries Work
Each node in the trie represents a common prefix of some keys. All the descendants of a node have a common prefix of the string associated with that node. The root is associated with the empty string.
Use Cases
- Autocomplete and predictive text
- Spell checkers
- IP routing tables
- Solving word games
Code Example
class TrieNode:
def __init__(self):
self.children = {}
self.is_end_of_word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end_of_word = True
def search(self, word):
node = self.root
for char in word:
if char not in node.children:
return False
node = node.children[char]
return node.is_end_of_word
def starts_with(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return False
node = node.children[char]
return True
# Usage
trie = Trie()
trie.insert("apple")
print(trie.search("apple")) # True
print(trie.search("app")) # False
print(trie.starts_with("app")) # True
4. Fenwick Tree (Binary Indexed Tree)
A Fenwick tree, also known as a binary indexed tree, is a data structure that can efficiently update elements and calculate prefix sums in a table of numbers.
How Fenwick Trees Work
Fenwick trees represent an array of numbers as a tree, where each node stores the sum of a range of elements. The structure of the tree allows for efficient updates and queries in O(log n) time.
Use Cases
- Calculating cumulative frequency
- Range sum queries
- Computational geometry
Code Example
class FenwickTree:
def __init__(self, n):
self.size = n
self.tree = [0] * (n + 1)
def update(self, i, delta):
while i <= self.size:
self.tree[i] += delta
i += i & (-i)
def sum(self, i):
s = 0
while i > 0:
s += self.tree[i]
i -= i & (-i)
return s
def range_sum(self, i, j):
return self.sum(j) - self.sum(i - 1)
# Usage
ft = FenwickTree(10)
ft.update(1, 3)
ft.update(2, 2)
ft.update(3, 5)
print(ft.range_sum(1, 3)) # 10
5. Segment Tree
A segment tree is a tree data structure used for storing information about intervals, or segments. It allows for efficient querying of cumulative data for a range of array positions, as well as efficient updating of array values.
How Segment Trees Work
The segment tree divides an array into a number of segments and stores them in a tree structure. Each node of the tree represents a segment of the array, with leaf nodes representing individual elements and internal nodes representing larger segments.
Use Cases
- Range minimum/maximum queries
- Range sum queries
- Lazy propagation
Code Example
class SegmentTree:
def __init__(self, arr):
self.n = len(arr)
self.tree = [0] * (4 * self.n)
self.build(arr, 0, 0, self.n - 1)
def build(self, arr, node, start, end):
if start == end:
self.tree[node] = arr[start]
else:
mid = (start + end) // 2
self.build(arr, 2 * node + 1, start, mid)
self.build(arr, 2 * node + 2, mid + 1, end)
self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]
def update(self, index, value):
self._update(0, 0, self.n - 1, index, value)
def _update(self, node, start, end, index, value):
if start == end:
self.tree[node] = value
else:
mid = (start + end) // 2
if start <= index <= mid:
self._update(2 * node + 1, start, mid, index, value)
else:
self._update(2 * node + 2, mid + 1, end, index, value)
self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]
def query(self, left, right):
return self._query(0, 0, self.n - 1, left, right)
def _query(self, node, start, end, left, right):
if right < start or end < left:
return 0
if left <= start and end <= right:
return self.tree[node]
mid = (start + end) // 2
return (self._query(2 * node + 1, start, mid, left, right) +
self._query(2 * node + 2, mid + 1, end, left, right))
# Usage
arr = [1, 3, 5, 7, 9, 11]
st = SegmentTree(arr)
print(st.query(1, 3)) # 15
st.update(2, 6)
print(st.query(1, 3)) # 16
6. Disjoint Set (Union-Find)
A disjoint-set data structure, also known as a union-find data structure, keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets.
How Disjoint Sets Work
The data structure provides near-constant-time operations to add new sets, merge existing sets, and determine whether elements are in the same set. It does this by using techniques like “union by rank” and “path compression” to keep the tree balanced and flatten the structure during finds.
Use Cases
- Kruskal’s algorithm for minimum spanning trees
- Finding connected components in graphs
- Image processing
- Network connectivity
Code Example
class DisjointSet:
def __init__(self, vertices):
self.parent = {v: v for v in vertices}
self.rank = {v: 0 for v in vertices}
def find(self, item):
if self.parent[item] != item:
self.parent[item] = self.find(self.parent[item])
return self.parent[item]
def union(self, x, y):
xroot = self.find(x)
yroot = self.find(y)
if self.rank[xroot] < self.rank[yroot]:
self.parent[xroot] = yroot
elif self.rank[xroot] > self.rank[yroot]:
self.parent[yroot] = xroot
else:
self.parent[yroot] = xroot
self.rank[xroot] += 1
# Usage
vertices = ["A", "B", "C", "D", "E"]
ds = DisjointSet(vertices)
ds.union("A", "B")
ds.union("C", "D")
print(ds.find("A") == ds.find("B")) # True
print(ds.find("A") == ds.find("C")) # False
7. Van Emde Boas Tree
The van Emde Boas tree (vEB tree) is a tree data structure which implements an associative array with m-bit integer keys. It supports operations such as insert, delete, and find-next in O(log log M) time, where M is the maximum value of a key.
How Van Emde Boas Trees Work
The vEB tree recursively divides the range of possible keys into clusters. Each node in the tree represents a range of integers and contains information about which clusters in its range contain elements.
Use Cases
- IP routing tables
- Priority queues with integer keys
- Implementing other data structures like segment trees
Code Example
import math
class VEBTree:
def __init__(self, universe_size):
self.universe_size = universe_size
self.min = None
self.max = None
if universe_size <= 2:
self.summary = None
self.cluster = None
else:
self.summary = VEBTree(int(math.sqrt(universe_size)))
self.cluster = [VEBTree(int(math.sqrt(universe_size))) for _ in range(int(math.sqrt(universe_size)))]
def high(self, x):
return x // int(math.sqrt(self.universe_size))
def low(self, x):
return x % int(math.sqrt(self.universe_size))
def index(self, x, y):
return x * int(math.sqrt(self.universe_size)) + y
def insert(self, x):
if self.min is None:
self.min = self.max = x
return
if x < self.min:
x, self.min = self.min, x
if self.universe_size > 2:
if self.cluster[self.high(x)].min is None:
self.summary.insert(self.high(x))
self.cluster[self.high(x)].min = self.cluster[self.high(x)].max = self.low(x)
else:
self.cluster[self.high(x)].insert(self.low(x))
if x > self.max:
self.max = x
def member(self, x):
if x == self.min or x == self.max:
return True
elif self.universe_size == 2:
return False
else:
return self.cluster[self.high(x)].member(self.low(x))
def successor(self, x):
if self.min is not None and x < self.min:
return self.min
if self.universe_size == 2:
if x == 0 and self.max == 1:
return 1
else:
return None
if self.cluster[self.high(x)].max is not None and self.low(x) < self.cluster[self.high(x)].max:
offset = self.cluster[self.high(x)].successor(self.low(x))
return self.index(self.high(x), offset)
succ_cluster = self.summary.successor(self.high(x))
if succ_cluster is None:
return None
offset = self.cluster[succ_cluster].min
return self.index(succ_cluster, offset)
# Usage
veb = VEBTree(16)
veb.insert(2)
veb.insert(3)
veb.insert(4)
veb.insert(5)
veb.insert(7)
veb.insert(14)
print(veb.member(4)) # True
print(veb.member(6)) # False
print(veb.successor(5)) # 7
print(veb.successor(14)) # None
Conclusion
While these data structures may not be as commonly used as arrays or hash tables, they offer powerful solutions to specific problems. Understanding these overlooked data structures can significantly enhance your problem-solving skills and algorithm design capabilities.
As you continue your journey in computer science and programming, remember that the right data structure can make all the difference in the efficiency and elegance of your solutions. Don’t shy away from exploring these lesser-known structures – they might just be the key to solving your next challenging problem.
Keep practicing, experimenting, and expanding your knowledge. The world of data structures is vast and fascinating, with each structure offering unique advantages for specific scenarios. By mastering these overlooked data structures, you’ll be better equipped to tackle a wide range of programming challenges and optimize your code for various applications.