Performance Optimization In Distributed Systems

Performance optimization without measurement is guesswork. Engineers optimize the wrong things, spend weeks shaving milliseconds off code paths nobody uses, while actual bottlenecks go unaddressed. Optimization requires discipline: measure first, identify the bottleneck, optimize that, measure again.

Performance isn't one metric. It's the interaction of many metrics: how fast does a request reach the server (latency)? How many requests can you handle simultaneously (throughput)? How does performance degrade under load (p99 latency)? What does the user perceive (Core Web Vitals)?

Metrics That Matter

Most engineers focus on the wrong metrics. Average latency is almost useless. If you have 100 requests averaging 100ms, and one takes 10 seconds, the average is still about 100ms. That one slow request might be the most important to optimize.

Latency Percentiles: How long do requests take?

p50 (median): 50% of requests are faster than this.
p95: 95% are faster. The slow tail.
p99: 99% are faster. The slowest 1%.

Optimize p99, not average. That's where real users experience pain.

Request latencies: [10ms, 12ms, 11ms, ... 9s]
Average: ~100ms (misleading)
p50: 11ms
p95: 150ms
p99: 9s (the real problem)

Text

Throughput: How many requests per second can you handle? This depends on what you're measuring. Requests to a cache layer might be thousands per second. Requests that hit the database might be hundreds per second.

TTFB (Time to First Byte): How long until the first byte of the response arrives? This measures network latency and server response time. Everything else is waiting for this.

FCP (First Contentful Paint): How long until the browser paints something visible? Users perceive this as "is the page loading?"

LCP (Largest Contentful Paint): How long until the main content appears? This is what users experience as "is the page done?"

FID (First Input Delay): How long between a user interacting with the page and the browser responding? This measures how blocked the main thread is. High FID means the page feels sluggish.

These "Core Web Vitals" are what Google indexes for ranking. But more importantly, they correlate with user retention and conversion.

Identifying Bottlenecks

Optimization is a cascade. Optimize one bottleneck. Another becomes the bottleneck. Repeat.

Profiling Tools:

Browser DevTools can profile JavaScript execution time. performance.mark() and performance.measure() let you instrument your own code.

performance.mark('api-start');
await fetch('/api/users');
performance.mark('api-end');
performance.measure('api-time', 'api-start', 'api-end');
const measures = performance.getEntriesByName('api-time');
console.log(`API took ${measures[0].duration}ms`);

javascript

Backend profiling tools (py-spy, flamegraph, pprof) show where CPU time is spent. Database profiling (EXPLAIN ANALYZE, slow query logs) show where queries are slow.

Distributed Tracing: In a system with multiple services, a single user request spans multiple components. Tracing tools (Jaeger, Honeycomb, Datadog) show the full path and where time is spent.

Flow diagram

User Request

↓

Auth Service (50ms)

↓

Product Service (150ms)

↓

Database Query (100ms)

↓

Cache Miss (50ms)

↓

Price Service (200ms) ← bottleneck

↓

Render (100ms)

↓

Total: 500ms

See the Price Service? It's the bottleneck. No point optimizing the Auth Service (it's 10% of the time). Optimize the Price Service.

Frontend Optimization

Frontend performance has many angles.

Bundle Size: Smaller bundles download faster. Code splitting, tree shaking, minification. Measure bundle size before and after changes.

# See what's in your bundle
webpack-bundle-analyzer
# Remove unused code
npm run build -- --production

Bash

Lazy Loading: Load code and components only when needed. The user doesn't need product recommendations on page load. Delay loading that until later.

// Lazy load a component
const Recommendations = lazy(() => import('./Recommendations'));

javascript

Image Optimization: Images are often the largest asset. Optimize:

Use next-gen formats (WebP, AVIF)
Resize for the device (responsive images)
Compress aggressively
Lazy load images below the fold

<img
  srcset="image-small.webp 640w, image-large.webp 1280w"
  src="image-large.jpg"
  loading="lazy"
  alt="Product"
/>

HTML

Rendering Performance: JavaScript on the main thread blocks interaction.

Use requestAnimationFrame for animations (syncs with browser refresh)
Batch DOM updates (debounce, requestAnimationFrame)
Use content-visibility to skip rendering of off-screen content

/* Skip rendering off-screen content */
.below-fold {
  content-visibility: auto;
}

CSS

Hydration: When a page loads, JavaScript must "hydrate" the static HTML with event listeners and state. Hydration can block interaction.

Strategies:

Progressive hydration: hydrate critical parts first
Streaming: send HTML as soon as it's ready, hydrate in background

Backend Optimization

Backend performance focuses on throughput and latency.

Query Optimization: Database queries are often the bottleneck.

Use indexes. A query that scans 1 million rows is slow. A query that uses an index to find 100 rows is fast.
Avoid SELECT *. Fetch only what you need.
Use EXPLAIN ANALYZE to see the query plan.

EXPLAIN ANALYZE SELECT * FROM users WHERE email = ?;
-- Should show an index scan, not a sequential scan

SQL

Connection Pooling: Creating a database connection is expensive. Pool connections and reuse them.

const pool = new Pool({ max: 20 }); // Max 20 connections
const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

javascript

Caching: Caching strategies deserve their own deep dive. Cache expensive results. Reduce database queries.

Async Processing: Long-running operations block the response. Move them to background workers.

Instead of:

@app.post('/process-image')
def process_image(file):
    # Takes 10 seconds
    result = slow_image_processing(file)
    return result

Python

Use a queue:

@app.post('/process-image')
def process_image(file):
    job_id = queue.enqueue(slow_image_processing, file)
    return { 'job_id': job_id, 'status': 'processing' }

Python

The response is instant. Processing happens in the background.

Rate Limiting and Load Shedding: When load exceeds capacity, gracefully reject requests instead of slowing everything down.

Network Optimization

Network is often the bottleneck, especially for mobile users.

Compression: Gzip, Brotli. Reduce payload size.

Content-Encoding: gzip
Content-Length: 1234 (was 5678 before compression)

Text

HTTP/2 and HTTP/3: Multiplexing, server push, header compression. Faster than HTTP/1.1. Use these if possible.

CDN: Content Delivery Networks cache content geographically. Users download from servers near them.

Connection Reuse: Keep connections alive. Don't open a new connection for each request.

Keep-Alive: timeout=5, max=100

Text

Request Batching: Instead of 10 separate requests, make 1 batch request. Reduces overhead.

Performance Budgets

A performance budget is a constraint on how much your build can be. "Bundles must be under 50KB gzipped." Enforce this in CI.

// Enforced in CI
if (bundleSize > 50 * 1024) {
  throw new Error(`Bundle too large: ${bundleSize} bytes`);
}

javascript

Budgets prevent death by a thousand cuts. Without them, performance slowly degrades as features are added.

Optimization Trade-offs

Every optimization has a trade-off.

Optimization	Pro	Con
Minification	Smaller bundles	Harder to debug
Caching	Fewer requests	Stale data, invalidation complexity
Lazy loading	Faster initial page	Later interactions are slower
Connection pooling	Reuses connections	Uses more memory
Async processing	Responsive API	Operations fail in background

Choose optimizations based on what matters for your use case. A real-time dashboard might prefer synchronous calls. A bulk report generator might prefer async.

Common Pitfalls

Premature Optimization: One of the classic anti-patterns in software design. Optimizing code before you know it's slow. Measure first.

Micro-Optimizations: Saving 1ms in a function that's called 100 times per second. Good. Saving 1ms in a function called once per day. Not worth the complexity.

Over-Caching: Caching stale data causes more problems than slow responses. Cache selectively.

Ignoring Client Performance: Backend engineers optimize the API. Frontend engineers ignore bundle size. Both matter.

AI-Generated Code and Performance

AI code generators tend to miss obvious optimizations. They fetch the same data repeatedly instead of caching. They generate inefficient queries. They bundle everything instead of splitting code.

Bitloops helps by baking optimizations into generated code. Data fetching automatically caches. Queries are analyzed for efficiency. Bundles are split by default.

Frequently Asked Questions

How much should I optimize?

Until you hit diminishing returns. A 10% improvement in a bottleneck is worth effort. A 1% improvement in non-bottleneck code is not.

Is latency or throughput more important?

Latency affects user perception (how fast does my action complete?). Throughput affects cost (how many users can I serve?). Both matter, but for different reasons.

When should I use caching?

When you have hot data that's read much more than it changes. Products are read thousands of times, changed once. Cache them. Personalized recommendations are read and written frequently. Maybe not worth caching.

How do I optimize for slow networks?

Reduce payload size (minification, compression). Reduce requests (bundling, batching). Reduce code execution (optimize algorithms).

Should I optimize for mobile?

Yes. Mobile networks are slower and less reliable than desktop. Optimizations for mobile benefit everyone.

How do I know if my optimization worked?

Measure before and after. Real user metrics matter most. Lab metrics (synthetic tests) are useful but not always representative.

Primary Sources

Ilya Grigorik's guide to web performance and networking optimization. High Performance Browser Networking
Martin Kleppmann's comprehensive guide to data-intensive system performance. Designing Data-Intensive Applications
Michael Nygard's guide to performance optimization under production load. Release It!
Markus Winand's guide to database indexing and performance optimization. Use The Index Luke
Google's Site Reliability Engineering book on performance and optimization. SRE Book
Google SRE workbook with practical performance patterns and strategies. SRE Workbook
Brewer's CAP theorem update addressing performance and consistency. CAP Twelve Years Later

Metrics That Matter

Identifying Bottlenecks

Frontend Optimization

Backend Optimization

Network Optimization

Performance Budgets

Optimization Trade-offs

Common Pitfalls

AI-Generated Code and Performance

Frequently Asked Questions

How much should I optimize?

Is latency or throughput more important?

When should I use caching?

How do I optimize for slow networks?

Should I optimize for mobile?

How do I know if my optimization worked?

Primary Sources

More in this hub

Get Started with Bitloops.