Caching Strategies In Distributed Systems
Caching is the most powerful scaling tool you have. Every cache layer (browser, CDN, app, database) saves a round trip. But caches create their own problems—stale data, inconsistency, cache stampedes. Get it right and you reduce backend load by 90%.
Caching is the most powerful tool in a distributed system's arsenal. A cache miss costs orders of magnitude more time and computation than a cache hit. A well-designed cache layer can reduce backend load by 90%. But caches are also the source of some of the hardest bugs in production systems: stale data, inconsistency, security vulnerabilities.
Phil Karlton said there are only two hard things in computer science: cache invalidation and naming things. He wasn't joking. Getting caching right requires understanding what happens at every layer of your system, from the database to the browser to the CDN.
The Caching Stack
Modern systems have caches at every layer. The trick is understanding when each layer matters and how they interact.
Database Cache: The database itself caches hot data in memory (the buffer pool). Expensive queries are cached in application-level query result caches.
Application Cache: Redis, Memcached, or in-memory caches hold frequently-accessed data. Your application code decides what to cache and when to invalidate it.
Reverse Proxy Cache: A reverse proxy (Nginx, Varnish) sits in front of your application servers. It caches HTTP responses, reducing load on application servers. It respects HTTP Cache-Control headers.
CDN Cache: A Content Delivery Network caches content at hundreds of edge locations worldwide. Users get content from servers near them.
Browser Cache: The HTTP cache in the browser. The Service Worker cache. LocalStorage. Multiple layers of persistence on the client.
These layers don't coordinate. Invalidating the application cache doesn't touch the reverse proxy cache. Purging the CDN doesn't clear the browser. You must explicitly manage each layer.
Read Patterns
Different read patterns require different strategies.
Cache-Aside (Lazy Loading): The application checks the cache first. On cache miss, it fetches from the database and stores in cache. Simple, but the first request after cache invalidation is slow (the miss penalty).
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached:
return cached
user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
cache.set(f"user:{user_id}", user, ttl=3600)
return userRead-Through: The cache is the authoritative read layer. The application always reads from cache. If the cache misses, a loader function fetches from the database transparently.
cache = ReadThroughCache(
loader=lambda key: database.query(f"SELECT * FROM users WHERE id = {key}")
)
user = cache.get(f"user:{user_id}") # Automatic load on missRead-through is cleaner but requires the cache to handle loading logic. It's more common in caching libraries (Apollo cache, React Query).
Warming the Cache: Preload the cache with data that will be accessed. This prevents cold-start performance penalties. Load popular items before traffic arrives.
Write Patterns
How writes flow through the cache is just as important as reads.
Write-Through: Write to cache and database simultaneously. Success is returned when both succeed. The cache is always consistent with the database, but writes are slow (you wait for both).
def update_user(user_id, data):
database.update(f"UPDATE users SET ... WHERE id = {user_id}", data)
cache.set(f"user:{user_id}", data)
return dataWrite-Behind (Write-Back): Write to cache immediately, asynchronously flush to database. Writes feel instant. But if the system crashes, data in cache isn't yet persisted. This requires intent preservation (see consistency models and failure handling).
def update_user(user_id, data):
cache.set(f"user:{user_id}", data)
queue_for_database(user_id, data) # Async flush
return dataWrite-Around: Write directly to the database, bypass cache. Invalidate or ignore the cache on write. Simple but cache might be behind after writes.
| Pattern | Write Speed | Consistency | Data Loss Risk | When to use |
|---|---|---|---|---|
| Write-Through | Slow | Strong | Low | Critical data, payments |
| Write-Behind | Fast | Eventual | Medium | Offline-first, drafts |
| Write-Around | Medium | Eventual | Low | Simple CRUD |
Cache Invalidation
Invalidation is where caching gets hard. Adding a cache is easy. Knowing when to clear it is the design challenge.
Time-Based (TTL): Data expires after a fixed duration. Simple to implement.
cache.set("user:123", user_data, ttl=3600) # Expires in 1 hourThe challenge is tuning TTL. Too long, and users see stale data. Too short, and the cache loses benefit. Different data needs different TTLs. A product description might cache for 24 hours. A price might cache for 5 minutes. A user's permission might cache for 30 seconds.
Event-Based (Active Invalidation): Invalidate when you know data changed. After a mutation, invalidate related queries.
def update_user(user_id, data):
database.update(...)
cache.delete(f"user:{user_id}") # Explicit invalidation
emit("user.updated", user_id) # Notify other servicesThis requires explicit dependency maps. You must know which cache keys depend on which data changes. Miss an invalidation, and the cache is permanently stale until TTL expires or manual purge.
Version-Based (ETags): Include a version identifier with the cached data. On each read, check if the version changed.
GET /user/123
Cache-Control: max-age=3600
ETag: "abc123"
// Later request:
GET /user/123
If-None-Match: "abc123"
// Server responds 304 Not Modified - data hasn't changedConditional requests (If-None-Match, If-Modified-Since) let you validate freshness without transferring the payload. The server returns 304 Not Modified if the ETag matches, saving bandwidth.
Manual/On-Demand: The user or application explicitly triggers cache clear. Pull-to-refresh. Logout invalidation. The escape hatch when automatic strategies fail.
Invalidation at Scale
At scale, invalidation becomes complex. A single mutation might invalidate dozens of cache keys. You might not know all the dependencies upfront.
Dependency Graphs: Explicitly map which cache keys depend on which data.
invalidation_graph = {
"post:123": ["feed:user:456", "feed:user:789", "profile:456"],
"user:456": ["profile:456", "notifications:456"]
}
def invalidate(entity_type, entity_id):
key = f"{entity_type}:{entity_id}"
affected = invalidation_graph.get(key, [])
for affected_key in affected:
cache.delete(affected_key)Purge on Demand: When a mutation is critical, purge all related caches. It's blunt but safe.
def update_pricing():
database.update("pricing rules", data)
cache.flush_all() # Nuclear option - clear everythingVersioned Cache Keys: Include a version in the key. When data changes, increment the version. Old keys expire naturally.
cache_version = {
"product": 42,
"pricing": 7,
}
def get_product(product_id):
key = f"product:{product_id}:v{cache_version['product']}"
return cache.get(key)
def update_pricing():
database.update(...)
cache_version['pricing'] += 1 # Invalidates all pricing keysCache Stampede Prevention
A cache stampede (or thundering herd) happens when a popular cache entry expires and every request tries to load it simultaneously, overwhelming the database.
A user's feed is cached. The cache expires. 10,000 concurrent requests for the feed all hit the database at once. The database can't handle it and becomes slow or crashes.
Lock-Based: The first request acquires a lock and loads the data. Other requests wait for the lock. Only one loader runs.
def get_feed(user_id):
key = f"feed:{user_id}"
cached = cache.get(key)
if cached:
return cached
with cache.lock(f"{key}:lock", timeout=5):
# Lock acquired, check cache again (another thread might have loaded)
cached = cache.get(key)
if cached:
return cached
# Load and cache
data = database.query_feed(user_id)
cache.set(key, data, ttl=3600)
return dataProbabilistic Early Expiration: Expire cache entries with some probability before the TTL expires. This spreads out the load instead of creating a thundering herd.
def get_feed(user_id):
key = f"feed:{user_id}"
cached, ttl_remaining = cache.get_with_ttl(key)
if cached:
# Expire early with small probability, reducing timestamp synchronicity
if random() < 0.1 * (1 - ttl_remaining / ttl):
cache.delete(key) # Probabilistic early deletion
else:
return cached
data = database.query_feed(user_id)
cache.set(key, data, ttl=3600)
return dataStale-While-Revalidate: Serve stale data while revalidating in the background.
Cache-Control: max-age=3600, stale-while-revalidate=86400The client serves cached data for 1 hour. Between 1 hour and 1 day, the client serves stale data while fetching fresh data in the background. No thundering herd.
Caching Sensitive Data
Caching and security are at odds. Caches reduce security. A cached password is a vulnerability. Cached personal data is a privacy risk.
Cache-Control Headers: Mark sensitive responses as non-cacheable.
Cache-Control: no-store, no-cache, must-revalidate, max-age=0no-store: Don't cache at allno-cache: Don't serve without revalidationmust-revalidate: If stale, must revalidate (don't serve stale)
CDN Caching of Personalized Data: Dangerous. If you cache a user-specific response, it might be served to a different user. Always use cache keys that include the user ID or set Vary: Authorization to tell the CDN not to serve one user's cache to another.
Encrypted Cache: If data is sensitive, encrypt it before caching. The cache stores encrypted data. Only the owner has the decryption key.
Practical Caching Patterns
Cache Hierarchy: Combine multiple patterns.
Flow diagram
L1 is fastest but limited. L2 is shared but still fast. L3 is persistent but slower. L4 is massive but very slow.
Cache Coherence: When data updates in L3, invalidate L1 and L2. When reading, check L1 first, then L2, then L3.
Cache Buddy System: When a popular item is about to expire, refresh it in the background instead of waiting for a request. Prevents stampedes.
def refresh_hot_items():
for item in hot_items:
key = f"item:{item}"
ttl_remaining = cache.ttl(key)
if ttl_remaining < 300: # Less than 5 minutes remaining
data = database.get(item)
cache.set(key, data, ttl=3600)Caching and AI-Generated Code
AI-generated code tends to be cache-naive. Generators often fetch fresh data on every request, missing obvious caching opportunities. They generate redundant queries without batching.
Bitloops helps by embedding caching patterns into generated code. Data queries automatically check caches before hitting the database. Mutations automatically invalidate relevant caches. The system generates cache-aware code by default.
Frequently Asked Questions
When should I add caching?
When reads are slow or load on the backend is high. Measure first. Profile your queries — see performance optimization for how. Add caching where it helps most (usually the database is the bottleneck).
Should I cache at the application level or use a CDN?
Both. CDNs cache at the edge (good for static content and personalization-light data). Application caches handle dynamic, personalized data. Reverse proxy caches reduce load on app servers.
How long should TTL be?
Depends on data change frequency and staleness tolerance. A product price changes hourly—use 10-minute TTL. A user's avatar rarely changes—use 1-day TTL. Start with 5 minutes and tune based on freshness needs and cache hit rate.
Is Redis or Memcached better?
Memcached is simpler and faster for pure caching. Redis is more powerful (supports data structures, persistence, pub/sub). Use Memcached if you just need fast key-value caching. Use Redis if you need features beyond that.
Can I cache POST requests?
Generally no. POST requests should not be cached (they modify state). But some APIs return the same response for the same POST body. Use idempotency keys and cache based on the key, not the request itself.
How do I know my cache is working?
Monitor cache hit rate. Tools like Redis report hit rate. If hit rate is below 50%, reconsider what you're caching. If hit rate is high, your cache is working.
Primary Sources
- Martin Kleppmann's authoritative guide to designing data-intensive systems. Designing Data-Intensive Applications
- Redis documentation covering caching patterns and data structures. Redis Docs
- Memcached official documentation for distributed memory caching. Memcached Docs
- Google's Site Reliability Engineering book on caching and performance. SRE Book
- Google SRE workbook with practical caching strategies and examples. SRE Workbook
- Brewer's CAP theorem update addressing modern caching and consistency. CAP Twelve Years Later
- Apache Kafka documentation covering event-based caching patterns. Kafka Docs
More in this hub
Caching Strategies In Distributed Systems
4 / 10Previous
Article 3
Performance Optimization In Distributed Systems
Next
Article 5
State Management At Scale
Also in this hub
Get Started with Bitloops.
Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.
curl -sSL https://bitloops.com/install.sh | bash