Caching Strategies In Distributed Systems

Caching is the most powerful tool in a distributed system's arsenal. A cache miss costs orders of magnitude more time and computation than a cache hit. A well-designed cache layer can reduce backend load by 90%. But caches are also the source of some of the hardest bugs in production systems: stale data, inconsistency, security vulnerabilities.

Phil Karlton said there are only two hard things in computer science: cache invalidation and naming things. He wasn't joking. Getting caching right requires understanding what happens at every layer of your system, from the database to the browser to the CDN.

The Caching Stack

Modern systems have caches at every layer. The trick is understanding when each layer matters and how they interact.

Database Cache: The database itself caches hot data in memory (the buffer pool). Expensive queries are cached in application-level query result caches.

Application Cache: Redis, Memcached, or in-memory caches hold frequently-accessed data. Your application code decides what to cache and when to invalidate it.

Reverse Proxy Cache: A reverse proxy (Nginx, Varnish) sits in front of your application servers. It caches HTTP responses, reducing load on application servers. It respects HTTP Cache-Control headers.

CDN Cache: A Content Delivery Network caches content at hundreds of edge locations worldwide. Users get content from servers near them.

Browser Cache: The HTTP cache in the browser. The Service Worker cache. LocalStorage. Multiple layers of persistence on the client.

These layers don't coordinate. Invalidating the application cache doesn't touch the reverse proxy cache. Purging the CDN doesn't clear the browser. You must explicitly manage each layer.

Read Patterns

Different read patterns require different strategies.

Cache-Aside (Lazy Loading): The application checks the cache first. On cache miss, it fetches from the database and stores in cache. Simple, but the first request after cache invalidation is slow (the miss penalty).

def get_user(user_id):
    cached = cache.get(f"user:{user_id}")
    if cached:
        return cached

    user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
    cache.set(f"user:{user_id}", user, ttl=3600)
    return user

Python

Read-Through: The cache is the authoritative read layer. The application always reads from cache. If the cache misses, a loader function fetches from the database transparently.

cache = ReadThroughCache(
    loader=lambda key: database.query(f"SELECT * FROM users WHERE id = {key}")
)
user = cache.get(f"user:{user_id}")  # Automatic load on miss

Python

Read-through is cleaner but requires the cache to handle loading logic. It's more common in caching libraries (Apollo cache, React Query).

Warming the Cache: Preload the cache with data that will be accessed. This prevents cold-start performance penalties. Load popular items before traffic arrives.

Write Patterns

How writes flow through the cache is just as important as reads.

Write-Through: Write to cache and database simultaneously. Success is returned when both succeed. The cache is always consistent with the database, but writes are slow (you wait for both).

def update_user(user_id, data):
    database.update(f"UPDATE users SET ... WHERE id = {user_id}", data)
    cache.set(f"user:{user_id}", data)
    return data

Python

Write-Behind (Write-Back): Write to cache immediately, asynchronously flush to database. Writes feel instant. But if the system crashes, data in cache isn't yet persisted. This requires intent preservation (see consistency models and failure handling).

def update_user(user_id, data):
    cache.set(f"user:{user_id}", data)
    queue_for_database(user_id, data)  # Async flush
    return data

Python

Write-Around: Write directly to the database, bypass cache. Invalidate or ignore the cache on write. Simple but cache might be behind after writes.

Pattern	Write Speed	Consistency	Data Loss Risk	When to use
Write-Through	Slow	Strong	Low	Critical data, payments
Write-Behind	Fast	Eventual	Medium	Offline-first, drafts
Write-Around	Medium	Eventual	Low	Simple CRUD

Cache Invalidation

Invalidation is where caching gets hard. Adding a cache is easy. Knowing when to clear it is the design challenge.

Time-Based (TTL): Data expires after a fixed duration. Simple to implement.

cache.set("user:123", user_data, ttl=3600)  # Expires in 1 hour

Python

The challenge is tuning TTL. Too long, and users see stale data. Too short, and the cache loses benefit. Different data needs different TTLs. A product description might cache for 24 hours. A price might cache for 5 minutes. A user's permission might cache for 30 seconds.

Event-Based (Active Invalidation): Invalidate when you know data changed. After a mutation, invalidate related queries.

def update_user(user_id, data):
    database.update(...)
    cache.delete(f"user:{user_id}")  # Explicit invalidation
    emit("user.updated", user_id)  # Notify other services

Python

This requires explicit dependency maps. You must know which cache keys depend on which data changes. Miss an invalidation, and the cache is permanently stale until TTL expires or manual purge.

Version-Based (ETags): Include a version identifier with the cached data. On each read, check if the version changed.

GET /user/123
Cache-Control: max-age=3600
ETag: "abc123"

// Later request:
GET /user/123
If-None-Match: "abc123"
// Server responds 304 Not Modified - data hasn't changed

Text

Conditional requests (If-None-Match, If-Modified-Since) let you validate freshness without transferring the payload. The server returns 304 Not Modified if the ETag matches, saving bandwidth.

Manual/On-Demand: The user or application explicitly triggers cache clear. Pull-to-refresh. Logout invalidation. The escape hatch when automatic strategies fail.

Invalidation at Scale

At scale, invalidation becomes complex. A single mutation might invalidate dozens of cache keys. You might not know all the dependencies upfront.

Dependency Graphs: Explicitly map which cache keys depend on which data.

invalidation_graph = {
    "post:123": ["feed:user:456", "feed:user:789", "profile:456"],
    "user:456": ["profile:456", "notifications:456"]
}

def invalidate(entity_type, entity_id):
    key = f"{entity_type}:{entity_id}"
    affected = invalidation_graph.get(key, [])
    for affected_key in affected:
        cache.delete(affected_key)

Python

Purge on Demand: When a mutation is critical, purge all related caches. It's blunt but safe.

def update_pricing():
    database.update("pricing rules", data)
    cache.flush_all()  # Nuclear option - clear everything

Python

Versioned Cache Keys: Include a version in the key. When data changes, increment the version. Old keys expire naturally.

cache_version = {
    "product": 42,
    "pricing": 7,
}

def get_product(product_id):
    key = f"product:{product_id}:v{cache_version['product']}"
    return cache.get(key)

def update_pricing():
    database.update(...)
    cache_version['pricing'] += 1  # Invalidates all pricing keys

Python

Cache Stampede Prevention

A cache stampede (or thundering herd) happens when a popular cache entry expires and every request tries to load it simultaneously, overwhelming the database.

A user's feed is cached. The cache expires. 10,000 concurrent requests for the feed all hit the database at once. The database can't handle it and becomes slow or crashes.

Lock-Based: The first request acquires a lock and loads the data. Other requests wait for the lock. Only one loader runs.

def get_feed(user_id):
    key = f"feed:{user_id}"
    cached = cache.get(key)
    if cached:
        return cached

    with cache.lock(f"{key}:lock", timeout=5):
        # Lock acquired, check cache again (another thread might have loaded)
        cached = cache.get(key)
        if cached:
            return cached

        # Load and cache
        data = database.query_feed(user_id)
        cache.set(key, data, ttl=3600)
        return data

Python

Probabilistic Early Expiration: Expire cache entries with some probability before the TTL expires. This spreads out the load instead of creating a thundering herd.

def get_feed(user_id):
    key = f"feed:{user_id}"
    cached, ttl_remaining = cache.get_with_ttl(key)

    if cached:
        # Expire early with small probability, reducing timestamp synchronicity
        if random() < 0.1 * (1 - ttl_remaining / ttl):
            cache.delete(key)  # Probabilistic early deletion
        else:
            return cached

    data = database.query_feed(user_id)
    cache.set(key, data, ttl=3600)
    return data

Python

Stale-While-Revalidate: Serve stale data while revalidating in the background.

Cache-Control: max-age=3600, stale-while-revalidate=86400

Text

The client serves cached data for 1 hour. Between 1 hour and 1 day, the client serves stale data while fetching fresh data in the background. No thundering herd.

Caching Sensitive Data

Caching and security are at odds. Caches reduce security. A cached password is a vulnerability. Cached personal data is a privacy risk.

Cache-Control Headers: Mark sensitive responses as non-cacheable.

Cache-Control: no-store, no-cache, must-revalidate, max-age=0

Text

no-store: Don't cache at all
no-cache: Don't serve without revalidation
must-revalidate: If stale, must revalidate (don't serve stale)

CDN Caching of Personalized Data: Dangerous. If you cache a user-specific response, it might be served to a different user. Always use cache keys that include the user ID or set Vary: Authorization to tell the CDN not to serve one user's cache to another.

Encrypted Cache: If data is sensitive, encrypt it before caching. The cache stores encrypted data. Only the owner has the decryption key.

Practical Caching Patterns

Cache Hierarchy: Combine multiple patterns.

Flow diagram

Request→In-Memory Cache (L1)

↓

Redis Cache (L2)→Database (L3)→Disk (L4)

L1 is fastest but limited. L2 is shared but still fast. L3 is persistent but slower. L4 is massive but very slow.

Cache Coherence: When data updates in L3, invalidate L1 and L2. When reading, check L1 first, then L2, then L3.

Cache Buddy System: When a popular item is about to expire, refresh it in the background instead of waiting for a request. Prevents stampedes.

def refresh_hot_items():
    for item in hot_items:
        key = f"item:{item}"
        ttl_remaining = cache.ttl(key)
        if ttl_remaining < 300:  # Less than 5 minutes remaining
            data = database.get(item)
            cache.set(key, data, ttl=3600)

Python

Caching and AI-Generated Code

AI-generated code tends to be cache-naive. Generators often fetch fresh data on every request, missing obvious caching opportunities. They generate redundant queries without batching.

Bitloops helps by embedding caching patterns into generated code. Data queries automatically check caches before hitting the database. Mutations automatically invalidate relevant caches. The system generates cache-aware code by default.

Frequently Asked Questions

When should I add caching?

When reads are slow or load on the backend is high. Measure first. Profile your queries — see performance optimization for how. Add caching where it helps most (usually the database is the bottleneck).

Should I cache at the application level or use a CDN?

Both. CDNs cache at the edge (good for static content and personalization-light data). Application caches handle dynamic, personalized data. Reverse proxy caches reduce load on app servers.

How long should TTL be?

Depends on data change frequency and staleness tolerance. A product price changes hourly—use 10-minute TTL. A user's avatar rarely changes—use 1-day TTL. Start with 5 minutes and tune based on freshness needs and cache hit rate.

Is Redis or Memcached better?

Memcached is simpler and faster for pure caching. Redis is more powerful (supports data structures, persistence, pub/sub). Use Memcached if you just need fast key-value caching. Use Redis if you need features beyond that.

Can I cache POST requests?

Generally no. POST requests should not be cached (they modify state). But some APIs return the same response for the same POST body. Use idempotency keys and cache based on the key, not the request itself.

How do I know my cache is working?

Monitor cache hit rate. Tools like Redis report hit rate. If hit rate is below 50%, reconsider what you're caching. If hit rate is high, your cache is working.

Primary Sources

Martin Kleppmann's authoritative guide to designing data-intensive systems. Designing Data-Intensive Applications
Redis documentation covering caching patterns and data structures. Redis Docs
Memcached official documentation for distributed memory caching. Memcached Docs
Google's Site Reliability Engineering book on caching and performance. SRE Book
Google SRE workbook with practical caching strategies and examples. SRE Workbook
Brewer's CAP theorem update addressing modern caching and consistency. CAP Twelve Years Later
Apache Kafka documentation covering event-based caching patterns. Kafka Docs

The Caching Stack

Read Patterns

Write Patterns

Cache Invalidation

Invalidation at Scale

Cache Stampede Prevention

Caching Sensitive Data

Practical Caching Patterns

Caching and AI-Generated Code

Frequently Asked Questions

When should I add caching?

Should I cache at the application level or use a CDN?

How long should TTL be?

Is Redis or Memcached better?

Can I cache POST requests?

How do I know my cache is working?

Primary Sources

More in this hub

Get Started with Bitloops.