How to apply memoization in Python using functools

Memoization is a powerful optimization technique that can transform slow recursive functions into efficient algorithms by caching previously computed results. In this article, we’ll explore how to implement memoization in Python, focusing on the built-in tools from the functools
module.
What is Memoization?
Memoization is an optimization technique where you store the results of expensive function calls and return the cached result when the same inputs occur again. This is particularly useful for recursive functions or functions with repeated calculations.
The core concept behind memoization is simple:
- Before executing a function, check if the result for the current input is already cached
- If it is, return the cached result
- If not, compute the result, store it in the cache, and then return it
This approach can drastically improve performance for functions with overlapping subproblems, reducing time complexity from exponential to polynomial or even linear in many cases.
The Classic Example: Fibonacci Sequence
The Fibonacci sequence is the perfect example to demonstrate the power of memoization. Let’s start with the naive recursive implementation:
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
This implementation has a time complexity of O(2^n), making it extremely inefficient for larger values of n. The reason is that it recalculates the same values multiple times.
For example, calculating fibonacci(5)
involves:
fibonacci(4) + fibonacci(3)
fibonacci(3) + fibonacci(2) + fibonacci(2) + fibonacci(1)
- And so on…
Notice how fibonacci(3)
and fibonacci(2)
are calculated multiple times. This redundancy grows exponentially as n increases.
Manual Memoization Implementation
Before diving into functools
, let’s implement memoization manually to understand the concept better:
def fibonacci_with_memo(n, cache={}):
if n < 2:
return n
if n in cache:
return cache[n]
cache[n] = fibonacci_with_memo(n - 1, cache) + fibonacci_with_memo(n - 2, cache)
return cache[n]
With this implementation, we’ve reduced the time complexity to O(n) by storing each computed Fibonacci number in a dictionary and retrieving it when needed instead of recalculating.
However, there’s a potential issue with this implementation: using a mutable default argument (cache={}
) can lead to unexpected behavior if you’re not careful. The cache dictionary is created only once when the function is defined, not each time the function is called.
A safer implementation would be:
def fibonacci_with_memo(n, cache=None):
if cache is None:
cache = {}
if n < 2:
return n
if n in cache:
return cache[n]
cache[n] = fibonacci_with_memo(n - 1, cache) + fibonacci_with_memo(n - 2, cache)
return cache[n]
Using functools.lru_cache for Memoization
While manual memoization works, Python’s functools
module provides a cleaner and more efficient solution with the lru_cache
decorator.
LRU stands for “Least Recently Used,” and lru_cache
is a decorator that wraps a function with a memoizing callable that saves up to the maxsize
most recent calls. It provides a perfect tool for memoization without having to manage the cache manually.
from functools import lru_cache
@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
The maxsize=None
parameter means the cache can grow without bound. For most practical purposes, this is fine, but if you’re concerned about memory usage, you can set a specific maximum size.
Benefits of Using lru_cache
- Cleaner Code: No need to manually handle the cache
- Thread Safety: The cache is thread-safe, making it suitable for multithreaded applications
- Performance: Implemented in C, making it faster than a pure Python implementation
- Cache Statistics: Provides statistics about cache performance
Checking Cache Statistics
You can access cache statistics using the cache_info()
method:
fibonacci(30)
print(fibonacci.cache_info())
This will output something like:
CacheInfo(hits=28, misses=31, maxsize=None, currsize=31)
Here:
hits
represents the number of times the cache was successfully usedmisses
is the number of times the cache was not usedmaxsize
is the maximum size of the cachecurrsize
is the current size of the cache
Clearing the Cache
If you need to clear the cache, you can use the cache_clear()
method:
fibonacci.cache_clear()
When to Use Memoization
Memoization is particularly effective in the following scenarios:
- Recursive Functions: When a recursive function repeatedly calculates the same values
- Deterministic Functions: When the function output depends solely on its inputs
- Pure Functions: Functions without side effects
- Expensive Computations: When the function performs complex calculations
Practical Examples Beyond Fibonacci
Let’s explore some other practical examples where memoization can be beneficial.
Example 1: Calculating Factorial
from functools import lru_cache
@lru_cache(maxsize=None)
def factorial(n):
if n == 0 or n == 1:
return 1
return n * factorial(n - 1)
# Test with a large number
print(factorial(100))
Example 2: Dynamic Programming – Coin Change Problem
from functools import lru_cache
def coin_change(coins, amount):
@lru_cache(maxsize=None)
def dp(remaining):
if remaining == 0:
return 0
if remaining < 0:
return float('inf')
min_coins = float('inf')
for coin in coins:
min_coins = min(min_coins, dp(remaining - coin) + 1)
return min_coins
result = dp(amount)
return result if result != float('inf') else -1
# Example usage
coins = [1, 2, 5]
amount = 11
print(f"Minimum coins needed: {coin_change(coins, amount)}")
Example 3: Memoizing a Function that Computes Prime Factors
from functools import lru_cache
import math
@lru_cache(maxsize=None)
def prime_factors(n):
factors = []
# Check for division by 2
while n % 2 == 0:
factors.append(2)
n //= 2
# Check for division by odd numbers
for i in range(3, int(math.sqrt(n)) + 1, 2):
while n % i == 0:
factors.append(i)
n //= i
# If n is a prime number greater than 2
if n > 2:
factors.append(n)
return tuple(factors) # Using tuple because lists are not hashable
# Test with several numbers
for num in [12, 15, 36, 49, 12, 15]:
print(f"Prime factors of {num}: {prime_factors(num)}")
print(prime_factors.cache_info())
Advanced Uses of lru_cache
Setting a Maximum Cache Size
If memory usage is a concern, you can limit the cache size:
@lru_cache(maxsize=128)
def expensive_function(param):
# Some expensive computation
return result
When the cache reaches its maximum size, the least recently used entries are discarded to make room for new ones.
Typed Parameter Caching
By default, lru_cache
considers function calls with the same arguments as cache hits, regardless of their types. If you want to treat calls with different types as distinct, you can use the typed
parameter:
@lru_cache(maxsize=None, typed=True)
def add(x, y):
return x + y
# These will be cached separately
add(1, 2) # int + int
add(1.0, 2.0) # float + float
add("1", "2") # str + str
Memoizing Methods in Classes
When using lru_cache
with class methods, you need to be careful because self
is passed as an argument, and it might not be hashable. One solution is to use it with @staticmethod
:
class MathUtils:
@staticmethod
@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return MathUtils.fibonacci(n - 1) + MathUtils.fibonacci(n - 2)
For instance methods, you can use functools.cached_property
in Python 3.8+ or create a custom solution.
Alternatives to lru_cache in functools
functools.cache (Python 3.9+)
Python 3.9 introduced functools.cache
, which is a simpler version of lru_cache
with unlimited size:
from functools import cache
@cache
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
This is equivalent to @lru_cache(maxsize=None)
but with a more straightforward name.
functools.cached_property (Python 3.8+)
cached_property
is a decorator that transforms a method of a class into a property whose value is computed once and then cached as a normal attribute:
from functools import cached_property
class Circle:
def __init__(self, radius):
self.radius = radius
@cached_property
def area(self):
print("Computing area...")
return 3.14159 * self.radius * self.radius
# Usage
circle = Circle(5)
print(circle.area) # Computes and caches
print(circle.area) # Uses cached value
Performance Comparison
Let’s compare the performance of different implementations of the Fibonacci function:
import time
import functools
# Naive recursive implementation
def fib_naive(n):
if n < 2:
return n
return fib_naive(n - 1) + fib_naive(n - 2)
# Manual memoization
def fib_manual_memo(n, cache=None):
if cache is None:
cache = {}
if n < 2:
return n
if n in cache:
return cache[n]
cache[n] = fib_manual_memo(n - 1, cache) + fib_manual_memo(n - 2, cache)
return cache[n]
# Using lru_cache
@functools.lru_cache(maxsize=None)
def fib_lru_cache(n):
if n < 2:
return n
return fib_lru_cache(n - 1) + fib_lru_cache(n - 2)
# Iterative solution (for comparison)
def fib_iterative(n):
if n < 2:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
# Test function
def test_performance(func, n, name):
start = time.time()
result = func(n)
end = time.time()
print(f"{name} result for n={n}: {result}")
print(f"{name} time: {end - start:.6f} seconds\n")
# Test with n=35
n = 35
# Clear cache before testing
if hasattr(fib_lru_cache, "cache_clear"):
fib_lru_cache.cache_clear()
# Run tests
test_performance(fib_lru_cache, n, "lru_cache")
test_performance(lambda x: fib_manual_memo(x), n, "Manual memoization")
test_performance(fib_iterative, n, "Iterative")
# Naive is too slow for n=35, so test with smaller n
smaller_n = 20
test_performance(fib_naive, smaller_n, f"Naive (n={smaller_n})")
The results will clearly show that both memoization approaches (manual and lru_cache
) perform significantly better than the naive recursive approach, with the iterative solution being the fastest but less intuitive for many recursive problems.
Common Pitfalls and How to Avoid Them
1. Mutable Arguments
lru_cache
requires function arguments to be hashable. This means you can’t use mutable types like lists or dictionaries as arguments:
@lru_cache(maxsize=None)
def process_list(lst): # This will raise TypeError
return sum(lst)
# Solution: Use immutable types like tuples
@lru_cache(maxsize=None)
def process_list(lst_tuple):
return sum(lst_tuple)
# Usage
result = process_list(tuple([1, 2, 3]))
2. Functions with Side Effects
Memoization assumes that a function always returns the same output for the same input. Functions with side effects or that depend on external state might not work correctly with memoization:
counter = 0
@lru_cache(maxsize=None)
def increment_counter():
global counter
counter += 1
return counter
# This will always return 1 after the first call!
print(increment_counter()) # 1
print(increment_counter()) # Still 1 due to caching
3. Memory Usage
With maxsize=None
, the cache can grow indefinitely. For functions called with many different arguments, this could lead to excessive memory usage:
# Limit cache size for functions with many potential inputs
@lru_cache(maxsize=1024)
def expensive_function(n):
# Computation
return result
4. Instance Methods
When applying lru_cache
to instance methods, remember that self
is included in the cache key. If self
is not hashable, you’ll get an error:
class MyClass:
def __init__(self, value):
self.value = value
# This might not work if MyClass is not hashable
@lru_cache(maxsize=None)
def compute(self, x):
return self.value * x
Solution options:
- Make the class hashable by implementing
__hash__
and__eq__
- Use
cached_property
for properties - Use
staticmethod
if possible
Real-world Applications
Web Development
In web applications, memoization can be used to cache expensive database queries or API calls:
@lru_cache(maxsize=100)
def get_user_data(user_id):
# Simulate database query
print(f"Fetching data for user {user_id} from database")
return {"id": user_id, "name": f"User {user_id}"}
# First call fetches from database
print(get_user_data(123))
# Second call uses cached result
print(get_user_data(123))
Data Analysis
In data analysis pipelines, memoization can speed up repeated calculations:
import numpy as np
from functools import lru_cache
@lru_cache(maxsize=None)
def compute_statistics(data_tuple):
data = np.array(data_tuple)
print("Computing statistics...")
return {
"mean": np.mean(data),
"median": np.median(data),
"std": np.std(data)
}
# Convert list to tuple for hashability
data = tuple([1, 2, 3, 4, 5])
# First call computes
stats = compute_statistics(data)
print(stats)
# Second call uses cache
stats = compute_statistics(data)
print(stats)
Machine Learning
In machine learning, feature engineering often involves repetitive calculations that can benefit from memoization:
from functools import lru_cache
@lru_cache(maxsize=1000)
def feature_transform(value):
# Expensive feature transformation
print(f"Transforming {value}")
return value ** 2 + 3 * value + 1
# Process a dataset with potential duplicates
dataset = [1, 2, 3, 1, 2, 4, 5, 1]
transformed = [feature_transform(x) for x in dataset]
print(transformed)
print(feature_transform.cache_info())
Conclusion
Memoization is a powerful technique that can significantly improve the performance of recursive functions and other computations with overlapping subproblems. Python’s functools
module provides elegant tools like lru_cache
that make implementing memoization simple and efficient.
Key takeaways:
- Use
@lru_cache(maxsize=None)
for unlimited caching or specify a maximum size to limit memory usage - Ensure function arguments are hashable
- Apply memoization to pure functions without side effects
- Consider performance vs. memory tradeoffs for your specific use case
- Explore newer alternatives like
functools.cache
andfunctools.cached_property
in Python 3.9+ and 3.8+ respectively
By understanding and applying memoization effectively, you can write more efficient Python code that scales well with complex problems.