The Most Overlooked Data Structures You Need to Know


In the vast world of computer science and programming, data structures form the backbone of efficient algorithms and well-designed software. While most developers are familiar with common data structures like arrays, linked lists, and binary trees, there are several lesser-known yet powerful data structures that often go overlooked. In this comprehensive guide, we’ll explore these underappreciated gems and discuss why they deserve a place in your programming toolkit.

1. Bloom Filters

Bloom filters are probabilistic data structures used to test whether an element is a member of a set. They’re incredibly space-efficient and offer constant-time add and lookup operations.

How Bloom Filters Work

A Bloom filter consists of a bit array of m bits, initially all set to 0, and k different hash functions. To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.

To query for an element, feed it to each of the k hash functions to get k array positions. If any of the bits at these positions is 0, the element is definitely not in the set. If all are 1, then either the element is in the set, or the bits have been set to 1 during the insertion of other elements.

Use Cases

  • Spell checkers
  • Cache filtering
  • Network routers
  • Cryptocurrency wallets

Code Example

import mmh3
import math
import array

class BloomFilter:
    def __init__(self, size, hash_count):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = array.array('b', [0] * size)
    
    def add(self, item):
        for seed in range(self.hash_count):
            index = mmh3.hash(item, seed) % self.size
            self.bit_array[index] = 1
    
    def check(self, item):
        for seed in range(self.hash_count):
            index = mmh3.hash(item, seed) % self.size
            if self.bit_array[index] == 0:
                return False
        return True

# Usage
bf = BloomFilter(1000000, 5)
bf.add("hello")
print(bf.check("hello"))  # True
print(bf.check("world"))  # False

2. Skip Lists

Skip lists are a probabilistic alternative to balanced trees. They allow for fast search, insertion, and deletion of elements, with an expected time complexity of O(log n) for these operations.

How Skip Lists Work

A skip list is essentially a series of linked lists stacked on top of each other. The bottom layer contains all elements, and each higher layer acts as an “express lane” for the lists below, allowing for faster traversal.

Use Cases

  • Implementing sorted sets in Redis
  • File systems
  • In-memory databases

Code Example

import random

class Node:
    def __init__(self, key, level):
        self.key = key
        self.forward = [None] * (level + 1)

class SkipList:
    def __init__(self, max_level, p):
        self.max_level = max_level
        self.p = p
        self.header = Node(-1, max_level)
        self.level = 0

    def random_level(self):
        lvl = 0
        while random.random() < self.p and lvl < self.max_level:
            lvl += 1
        return lvl

    def insert(self, key):
        update = [None] * (self.max_level + 1)
        current = self.header

        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].key < key:
                current = current.forward[i]
            update[i] = current

        level = self.random_level()

        if level > self.level:
            for i in range(self.level + 1, level + 1):
                update[i] = self.header
            self.level = level

        new_node = Node(key, level)

        for i in range(level + 1):
            new_node.forward[i] = update[i].forward[i]
            update[i].forward[i] = new_node

    def search(self, key):
        current = self.header

        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].key < key:
                current = current.forward[i]

        current = current.forward[0]

        if current and current.key == key:
            return current.key
        return None

# Usage
sl = SkipList(4, 0.5)
sl.insert(3)
sl.insert(6)
sl.insert(7)
sl.insert(9)
print(sl.search(6))  # 6
print(sl.search(5))  # None

3. Trie (Prefix Tree)

A trie, also called a prefix tree, is an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings.

How Tries Work

Each node in the trie represents a common prefix of some keys. All the descendants of a node have a common prefix of the string associated with that node. The root is associated with the empty string.

Use Cases

  • Autocomplete and predictive text
  • Spell checkers
  • IP routing tables
  • Solving word games

Code Example

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True

    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def starts_with(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

# Usage
trie = Trie()
trie.insert("apple")
print(trie.search("apple"))   # True
print(trie.search("app"))     # False
print(trie.starts_with("app")) # True

4. Fenwick Tree (Binary Indexed Tree)

A Fenwick tree, also known as a binary indexed tree, is a data structure that can efficiently update elements and calculate prefix sums in a table of numbers.

How Fenwick Trees Work

Fenwick trees represent an array of numbers as a tree, where each node stores the sum of a range of elements. The structure of the tree allows for efficient updates and queries in O(log n) time.

Use Cases

  • Calculating cumulative frequency
  • Range sum queries
  • Computational geometry

Code Example

class FenwickTree:
    def __init__(self, n):
        self.size = n
        self.tree = [0] * (n + 1)

    def update(self, i, delta):
        while i <= self.size:
            self.tree[i] += delta
            i += i & (-i)

    def sum(self, i):
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & (-i)
        return s

    def range_sum(self, i, j):
        return self.sum(j) - self.sum(i - 1)

# Usage
ft = FenwickTree(10)
ft.update(1, 3)
ft.update(2, 2)
ft.update(3, 5)
print(ft.range_sum(1, 3))  # 10

5. Segment Tree

A segment tree is a tree data structure used for storing information about intervals, or segments. It allows for efficient querying of cumulative data for a range of array positions, as well as efficient updating of array values.

How Segment Trees Work

The segment tree divides an array into a number of segments and stores them in a tree structure. Each node of the tree represents a segment of the array, with leaf nodes representing individual elements and internal nodes representing larger segments.

Use Cases

  • Range minimum/maximum queries
  • Range sum queries
  • Lazy propagation

Code Example

class SegmentTree:
    def __init__(self, arr):
        self.n = len(arr)
        self.tree = [0] * (4 * self.n)
        self.build(arr, 0, 0, self.n - 1)

    def build(self, arr, node, start, end):
        if start == end:
            self.tree[node] = arr[start]
        else:
            mid = (start + end) // 2
            self.build(arr, 2 * node + 1, start, mid)
            self.build(arr, 2 * node + 2, mid + 1, end)
            self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]

    def update(self, index, value):
        self._update(0, 0, self.n - 1, index, value)

    def _update(self, node, start, end, index, value):
        if start == end:
            self.tree[node] = value
        else:
            mid = (start + end) // 2
            if start <= index <= mid:
                self._update(2 * node + 1, start, mid, index, value)
            else:
                self._update(2 * node + 2, mid + 1, end, index, value)
            self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]

    def query(self, left, right):
        return self._query(0, 0, self.n - 1, left, right)

    def _query(self, node, start, end, left, right):
        if right < start or end < left:
            return 0
        if left <= start and end <= right:
            return self.tree[node]
        mid = (start + end) // 2
        return (self._query(2 * node + 1, start, mid, left, right) +
                self._query(2 * node + 2, mid + 1, end, left, right))

# Usage
arr = [1, 3, 5, 7, 9, 11]
st = SegmentTree(arr)
print(st.query(1, 3))  # 15
st.update(2, 6)
print(st.query(1, 3))  # 16

6. Disjoint Set (Union-Find)

A disjoint-set data structure, also known as a union-find data structure, keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets.

How Disjoint Sets Work

The data structure provides near-constant-time operations to add new sets, merge existing sets, and determine whether elements are in the same set. It does this by using techniques like “union by rank” and “path compression” to keep the tree balanced and flatten the structure during finds.

Use Cases

  • Kruskal’s algorithm for minimum spanning trees
  • Finding connected components in graphs
  • Image processing
  • Network connectivity

Code Example

class DisjointSet:
    def __init__(self, vertices):
        self.parent = {v: v for v in vertices}
        self.rank = {v: 0 for v in vertices}

    def find(self, item):
        if self.parent[item] != item:
            self.parent[item] = self.find(self.parent[item])
        return self.parent[item]

    def union(self, x, y):
        xroot = self.find(x)
        yroot = self.find(y)

        if self.rank[xroot] < self.rank[yroot]:
            self.parent[xroot] = yroot
        elif self.rank[xroot] > self.rank[yroot]:
            self.parent[yroot] = xroot
        else:
            self.parent[yroot] = xroot
            self.rank[xroot] += 1

# Usage
vertices = ["A", "B", "C", "D", "E"]
ds = DisjointSet(vertices)
ds.union("A", "B")
ds.union("C", "D")
print(ds.find("A") == ds.find("B"))  # True
print(ds.find("A") == ds.find("C"))  # False

7. Van Emde Boas Tree

The van Emde Boas tree (vEB tree) is a tree data structure which implements an associative array with m-bit integer keys. It supports operations such as insert, delete, and find-next in O(log log M) time, where M is the maximum value of a key.

How Van Emde Boas Trees Work

The vEB tree recursively divides the range of possible keys into clusters. Each node in the tree represents a range of integers and contains information about which clusters in its range contain elements.

Use Cases

  • IP routing tables
  • Priority queues with integer keys
  • Implementing other data structures like segment trees

Code Example

import math

class VEBTree:
    def __init__(self, universe_size):
        self.universe_size = universe_size
        self.min = None
        self.max = None
        if universe_size <= 2:
            self.summary = None
            self.cluster = None
        else:
            self.summary = VEBTree(int(math.sqrt(universe_size)))
            self.cluster = [VEBTree(int(math.sqrt(universe_size))) for _ in range(int(math.sqrt(universe_size)))]

    def high(self, x):
        return x // int(math.sqrt(self.universe_size))

    def low(self, x):
        return x % int(math.sqrt(self.universe_size))

    def index(self, x, y):
        return x * int(math.sqrt(self.universe_size)) + y

    def insert(self, x):
        if self.min is None:
            self.min = self.max = x
            return

        if x < self.min:
            x, self.min = self.min, x

        if self.universe_size > 2:
            if self.cluster[self.high(x)].min is None:
                self.summary.insert(self.high(x))
                self.cluster[self.high(x)].min = self.cluster[self.high(x)].max = self.low(x)
            else:
                self.cluster[self.high(x)].insert(self.low(x))

        if x > self.max:
            self.max = x

    def member(self, x):
        if x == self.min or x == self.max:
            return True
        elif self.universe_size == 2:
            return False
        else:
            return self.cluster[self.high(x)].member(self.low(x))

    def successor(self, x):
        if self.min is not None and x < self.min:
            return self.min
        if self.universe_size == 2:
            if x == 0 and self.max == 1:
                return 1
            else:
                return None
        if self.cluster[self.high(x)].max is not None and self.low(x) < self.cluster[self.high(x)].max:
            offset = self.cluster[self.high(x)].successor(self.low(x))
            return self.index(self.high(x), offset)
        succ_cluster = self.summary.successor(self.high(x))
        if succ_cluster is None:
            return None
        offset = self.cluster[succ_cluster].min
        return self.index(succ_cluster, offset)

# Usage
veb = VEBTree(16)
veb.insert(2)
veb.insert(3)
veb.insert(4)
veb.insert(5)
veb.insert(7)
veb.insert(14)
print(veb.member(4))  # True
print(veb.member(6))  # False
print(veb.successor(5))  # 7
print(veb.successor(14))  # None

Conclusion

While these data structures may not be as commonly used as arrays or hash tables, they offer powerful solutions to specific problems. Understanding these overlooked data structures can significantly enhance your problem-solving skills and algorithm design capabilities.

As you continue your journey in computer science and programming, remember that the right data structure can make all the difference in the efficiency and elegance of your solutions. Don’t shy away from exploring these lesser-known structures – they might just be the key to solving your next challenging problem.

Keep practicing, experimenting, and expanding your knowledge. The world of data structures is vast and fascinating, with each structure offering unique advantages for specific scenarios. By mastering these overlooked data structures, you’ll be better equipped to tackle a wide range of programming challenges and optimize your code for various applications.