Performance Optimization In Distributed Systems
Optimization without measurement is guessing. Identify what actually matters—p99 latency, not averages; TTFB not pretty metrics—then optimize the real bottleneck, not what feels slow.
Performance optimization without measurement is guesswork. Engineers optimize the wrong things, spend weeks shaving milliseconds off code paths nobody uses, while actual bottlenecks go unaddressed. Optimization requires discipline: measure first, identify the bottleneck, optimize that, measure again.
Performance isn't one metric. It's the interaction of many metrics: how fast does a request reach the server (latency)? How many requests can you handle simultaneously (throughput)? How does performance degrade under load (p99 latency)? What does the user perceive (Core Web Vitals)?
Metrics That Matter
Most engineers focus on the wrong metrics. Average latency is almost useless. If you have 100 requests averaging 100ms, and one takes 10 seconds, the average is still about 100ms. That one slow request might be the most important to optimize.
Latency Percentiles: How long do requests take?
- p50 (median): 50% of requests are faster than this.
- p95: 95% are faster. The slow tail.
- p99: 99% are faster. The slowest 1%.
Optimize p99, not average. That's where real users experience pain.
Request latencies: [10ms, 12ms, 11ms, ... 9s]
Average: ~100ms (misleading)
p50: 11ms
p95: 150ms
p99: 9s (the real problem)Throughput: How many requests per second can you handle? This depends on what you're measuring. Requests to a cache layer might be thousands per second. Requests that hit the database might be hundreds per second.
TTFB (Time to First Byte): How long until the first byte of the response arrives? This measures network latency and server response time. Everything else is waiting for this.
FCP (First Contentful Paint): How long until the browser paints something visible? Users perceive this as "is the page loading?"
LCP (Largest Contentful Paint): How long until the main content appears? This is what users experience as "is the page done?"
FID (First Input Delay): How long between a user interacting with the page and the browser responding? This measures how blocked the main thread is. High FID means the page feels sluggish.
These "Core Web Vitals" are what Google indexes for ranking. But more importantly, they correlate with user retention and conversion.
Identifying Bottlenecks
Optimization is a cascade. Optimize one bottleneck. Another becomes the bottleneck. Repeat.
Profiling Tools:
Browser DevTools can profile JavaScript execution time. performance.mark() and performance.measure() let you instrument your own code.
performance.mark('api-start');
await fetch('/api/users');
performance.mark('api-end');
performance.measure('api-time', 'api-start', 'api-end');
const measures = performance.getEntriesByName('api-time');
console.log(`API took ${measures[0].duration}ms`);Backend profiling tools (py-spy, flamegraph, pprof) show where CPU time is spent. Database profiling (EXPLAIN ANALYZE, slow query logs) show where queries are slow.
Distributed Tracing: In a system with multiple services, a single user request spans multiple components. Tracing tools (Jaeger, Honeycomb, Datadog) show the full path and where time is spent.
Flow diagram
See the Price Service? It's the bottleneck. No point optimizing the Auth Service (it's 10% of the time). Optimize the Price Service.
Frontend Optimization
Frontend performance has many angles.
Bundle Size: Smaller bundles download faster. Code splitting, tree shaking, minification. Measure bundle size before and after changes.
# See what's in your bundle
webpack-bundle-analyzer
# Remove unused code
npm run build -- --productionLazy Loading: Load code and components only when needed. The user doesn't need product recommendations on page load. Delay loading that until later.
// Lazy load a component
const Recommendations = lazy(() => import('./Recommendations'));Image Optimization: Images are often the largest asset. Optimize:
- Use next-gen formats (WebP, AVIF)
- Resize for the device (responsive images)
- Compress aggressively
- Lazy load images below the fold
<img
srcset="image-small.webp 640w, image-large.webp 1280w"
src="image-large.jpg"
loading="lazy"
alt="Product"
/>Rendering Performance: JavaScript on the main thread blocks interaction.
- Use
requestAnimationFramefor animations (syncs with browser refresh) - Batch DOM updates (debounce, requestAnimationFrame)
- Use
content-visibilityto skip rendering of off-screen content
/* Skip rendering off-screen content */
.below-fold {
content-visibility: auto;
}Hydration: When a page loads, JavaScript must "hydrate" the static HTML with event listeners and state. Hydration can block interaction.
Strategies:
- Progressive hydration: hydrate critical parts first
- Streaming: send HTML as soon as it's ready, hydrate in background
Backend Optimization
Backend performance focuses on throughput and latency.
Query Optimization: Database queries are often the bottleneck.
- Use indexes. A query that scans 1 million rows is slow. A query that uses an index to find 100 rows is fast.
- Avoid
SELECT *. Fetch only what you need. - Use
EXPLAIN ANALYZEto see the query plan.
EXPLAIN ANALYZE SELECT * FROM users WHERE email = ?;
-- Should show an index scan, not a sequential scanConnection Pooling: Creating a database connection is expensive. Pool connections and reuse them.
const pool = new Pool({ max: 20 }); // Max 20 connections
const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);Caching: Caching strategies deserve their own deep dive. Cache expensive results. Reduce database queries.
Async Processing: Long-running operations block the response. Move them to background workers.
Instead of:
@app.post('/process-image')
def process_image(file):
# Takes 10 seconds
result = slow_image_processing(file)
return resultUse a queue:
@app.post('/process-image')
def process_image(file):
job_id = queue.enqueue(slow_image_processing, file)
return { 'job_id': job_id, 'status': 'processing' }The response is instant. Processing happens in the background.
Rate Limiting and Load Shedding: When load exceeds capacity, gracefully reject requests instead of slowing everything down.
Network Optimization
Network is often the bottleneck, especially for mobile users.
Compression: Gzip, Brotli. Reduce payload size.
Content-Encoding: gzip
Content-Length: 1234 (was 5678 before compression)HTTP/2 and HTTP/3: Multiplexing, server push, header compression. Faster than HTTP/1.1. Use these if possible.
CDN: Content Delivery Networks cache content geographically. Users download from servers near them.
Connection Reuse: Keep connections alive. Don't open a new connection for each request.
Keep-Alive: timeout=5, max=100Request Batching: Instead of 10 separate requests, make 1 batch request. Reduces overhead.
Performance Budgets
A performance budget is a constraint on how much your build can be. "Bundles must be under 50KB gzipped." Enforce this in CI.
// Enforced in CI
if (bundleSize > 50 * 1024) {
throw new Error(`Bundle too large: ${bundleSize} bytes`);
}Budgets prevent death by a thousand cuts. Without them, performance slowly degrades as features are added.
Optimization Trade-offs
Every optimization has a trade-off.
| Optimization | Pro | Con |
|---|---|---|
| Minification | Smaller bundles | Harder to debug |
| Caching | Fewer requests | Stale data, invalidation complexity |
| Lazy loading | Faster initial page | Later interactions are slower |
| Connection pooling | Reuses connections | Uses more memory |
| Async processing | Responsive API | Operations fail in background |
Choose optimizations based on what matters for your use case. A real-time dashboard might prefer synchronous calls. A bulk report generator might prefer async.
Common Pitfalls
Premature Optimization: One of the classic anti-patterns in software design. Optimizing code before you know it's slow. Measure first.
Micro-Optimizations: Saving 1ms in a function that's called 100 times per second. Good. Saving 1ms in a function called once per day. Not worth the complexity.
Over-Caching: Caching stale data causes more problems than slow responses. Cache selectively.
Ignoring Client Performance: Backend engineers optimize the API. Frontend engineers ignore bundle size. Both matter.
AI-Generated Code and Performance
AI code generators tend to miss obvious optimizations. They fetch the same data repeatedly instead of caching. They generate inefficient queries. They bundle everything instead of splitting code.
Bitloops helps by baking optimizations into generated code. Data fetching automatically caches. Queries are analyzed for efficiency. Bundles are split by default.
Frequently Asked Questions
How much should I optimize?
Until you hit diminishing returns. A 10% improvement in a bottleneck is worth effort. A 1% improvement in non-bottleneck code is not.
Is latency or throughput more important?
Latency affects user perception (how fast does my action complete?). Throughput affects cost (how many users can I serve?). Both matter, but for different reasons.
When should I use caching?
When you have hot data that's read much more than it changes. Products are read thousands of times, changed once. Cache them. Personalized recommendations are read and written frequently. Maybe not worth caching.
How do I optimize for slow networks?
Reduce payload size (minification, compression). Reduce requests (bundling, batching). Reduce code execution (optimize algorithms).
Should I optimize for mobile?
Yes. Mobile networks are slower and less reliable than desktop. Optimizations for mobile benefit everyone.
How do I know if my optimization worked?
Measure before and after. Real user metrics matter most. Lab metrics (synthetic tests) are useful but not always representative.
Primary Sources
- Ilya Grigorik's guide to web performance and networking optimization. High Performance Browser Networking
- Martin Kleppmann's comprehensive guide to data-intensive system performance. Designing Data-Intensive Applications
- Michael Nygard's guide to performance optimization under production load. Release It!
- Markus Winand's guide to database indexing and performance optimization. Use The Index Luke
- Google's Site Reliability Engineering book on performance and optimization. SRE Book
- Google SRE workbook with practical performance patterns and strategies. SRE Workbook
- Brewer's CAP theorem update addressing performance and consistency. CAP Twelve Years Later
More in this hub
Performance Optimization In Distributed Systems
3 / 10Previous
Article 2
Distributed Systems Fundamentals
Next
Article 4
Caching Strategies In Distributed Systems
Also in this hub
Get Started with Bitloops.
Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.
curl -sSL https://bitloops.com/install.sh | bash