Skip to content
Bitloops - Git captures what changed. Bitloops captures why.
HomeAbout usDocsBlog
ResourcesSystems Design & PerformancePerformance Optimization In Distributed Systems

Performance Optimization In Distributed Systems

Optimization without measurement is guessing. Identify what actually matters—p99 latency, not averages; TTFB not pretty metrics—then optimize the real bottleneck, not what feels slow.

8 min readUpdated March 4, 2026Systems Design & Performance

Performance optimization without measurement is guesswork. Engineers optimize the wrong things, spend weeks shaving milliseconds off code paths nobody uses, while actual bottlenecks go unaddressed. Optimization requires discipline: measure first, identify the bottleneck, optimize that, measure again.

Performance isn't one metric. It's the interaction of many metrics: how fast does a request reach the server (latency)? How many requests can you handle simultaneously (throughput)? How does performance degrade under load (p99 latency)? What does the user perceive (Core Web Vitals)?

Metrics That Matter

Most engineers focus on the wrong metrics. Average latency is almost useless. If you have 100 requests averaging 100ms, and one takes 10 seconds, the average is still about 100ms. That one slow request might be the most important to optimize.

Latency Percentiles: How long do requests take?

  • p50 (median): 50% of requests are faster than this.
  • p95: 95% are faster. The slow tail.
  • p99: 99% are faster. The slowest 1%.

Optimize p99, not average. That's where real users experience pain.

Request latencies: [10ms, 12ms, 11ms, ... 9s]
Average: ~100ms (misleading)
p50: 11ms
p95: 150ms
p99: 9s (the real problem)
Text

Throughput: How many requests per second can you handle? This depends on what you're measuring. Requests to a cache layer might be thousands per second. Requests that hit the database might be hundreds per second.

TTFB (Time to First Byte): How long until the first byte of the response arrives? This measures network latency and server response time. Everything else is waiting for this.

FCP (First Contentful Paint): How long until the browser paints something visible? Users perceive this as "is the page loading?"

LCP (Largest Contentful Paint): How long until the main content appears? This is what users experience as "is the page done?"

FID (First Input Delay): How long between a user interacting with the page and the browser responding? This measures how blocked the main thread is. High FID means the page feels sluggish.

These "Core Web Vitals" are what Google indexes for ranking. But more importantly, they correlate with user retention and conversion.

Identifying Bottlenecks

Optimization is a cascade. Optimize one bottleneck. Another becomes the bottleneck. Repeat.

Profiling Tools:

Browser DevTools can profile JavaScript execution time. performance.mark() and performance.measure() let you instrument your own code.

performance.mark('api-start');
await fetch('/api/users');
performance.mark('api-end');
performance.measure('api-time', 'api-start', 'api-end');
const measures = performance.getEntriesByName('api-time');
console.log(`API took ${measures[0].duration}ms`);
javascript

Backend profiling tools (py-spy, flamegraph, pprof) show where CPU time is spent. Database profiling (EXPLAIN ANALYZE, slow query logs) show where queries are slow.

Distributed Tracing: In a system with multiple services, a single user request spans multiple components. Tracing tools (Jaeger, Honeycomb, Datadog) show the full path and where time is spent.

Flow diagram

User Request
Auth Service (50ms)
Product Service (150ms)
Database Query (100ms)
Cache Miss (50ms)
Price Service (200ms) ← bottleneck
Render (100ms)
Total: 500ms

See the Price Service? It's the bottleneck. No point optimizing the Auth Service (it's 10% of the time). Optimize the Price Service.

Frontend Optimization

Frontend performance has many angles.

Bundle Size: Smaller bundles download faster. Code splitting, tree shaking, minification. Measure bundle size before and after changes.

# See what's in your bundle
webpack-bundle-analyzer
# Remove unused code
npm run build -- --production
Bash

Lazy Loading: Load code and components only when needed. The user doesn't need product recommendations on page load. Delay loading that until later.

// Lazy load a component
const Recommendations = lazy(() => import('./Recommendations'));
javascript

Image Optimization: Images are often the largest asset. Optimize:

  • Use next-gen formats (WebP, AVIF)
  • Resize for the device (responsive images)
  • Compress aggressively
  • Lazy load images below the fold
<img
  srcset="image-small.webp 640w, image-large.webp 1280w"
  src="image-large.jpg"
  loading="lazy"
  alt="Product"
/>
HTML

Rendering Performance: JavaScript on the main thread blocks interaction.

  • Use requestAnimationFrame for animations (syncs with browser refresh)
  • Batch DOM updates (debounce, requestAnimationFrame)
  • Use content-visibility to skip rendering of off-screen content
/* Skip rendering off-screen content */
.below-fold {
  content-visibility: auto;
}
CSS

Hydration: When a page loads, JavaScript must "hydrate" the static HTML with event listeners and state. Hydration can block interaction.

Strategies:

  • Progressive hydration: hydrate critical parts first
  • Streaming: send HTML as soon as it's ready, hydrate in background

Backend Optimization

Backend performance focuses on throughput and latency.

Query Optimization: Database queries are often the bottleneck.

  • Use indexes. A query that scans 1 million rows is slow. A query that uses an index to find 100 rows is fast.
  • Avoid SELECT *. Fetch only what you need.
  • Use EXPLAIN ANALYZE to see the query plan.
EXPLAIN ANALYZE SELECT * FROM users WHERE email = ?;
-- Should show an index scan, not a sequential scan
SQL

Connection Pooling: Creating a database connection is expensive. Pool connections and reuse them.

const pool = new Pool({ max: 20 }); // Max 20 connections
const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);
javascript

Caching: Caching strategies deserve their own deep dive. Cache expensive results. Reduce database queries.

Async Processing: Long-running operations block the response. Move them to background workers.

Instead of:

@app.post('/process-image')
def process_image(file):
    # Takes 10 seconds
    result = slow_image_processing(file)
    return result
Python

Use a queue:

@app.post('/process-image')
def process_image(file):
    job_id = queue.enqueue(slow_image_processing, file)
    return { 'job_id': job_id, 'status': 'processing' }
Python

The response is instant. Processing happens in the background.

Rate Limiting and Load Shedding: When load exceeds capacity, gracefully reject requests instead of slowing everything down.

Network Optimization

Network is often the bottleneck, especially for mobile users.

Compression: Gzip, Brotli. Reduce payload size.

Content-Encoding: gzip
Content-Length: 1234 (was 5678 before compression)
Text

HTTP/2 and HTTP/3: Multiplexing, server push, header compression. Faster than HTTP/1.1. Use these if possible.

CDN: Content Delivery Networks cache content geographically. Users download from servers near them.

Connection Reuse: Keep connections alive. Don't open a new connection for each request.

Keep-Alive: timeout=5, max=100
Text

Request Batching: Instead of 10 separate requests, make 1 batch request. Reduces overhead.

Performance Budgets

A performance budget is a constraint on how much your build can be. "Bundles must be under 50KB gzipped." Enforce this in CI.

// Enforced in CI
if (bundleSize > 50 * 1024) {
  throw new Error(`Bundle too large: ${bundleSize} bytes`);
}
javascript

Budgets prevent death by a thousand cuts. Without them, performance slowly degrades as features are added.

Optimization Trade-offs

Every optimization has a trade-off.

OptimizationProCon
MinificationSmaller bundlesHarder to debug
CachingFewer requestsStale data, invalidation complexity
Lazy loadingFaster initial pageLater interactions are slower
Connection poolingReuses connectionsUses more memory
Async processingResponsive APIOperations fail in background

Choose optimizations based on what matters for your use case. A real-time dashboard might prefer synchronous calls. A bulk report generator might prefer async.

Common Pitfalls

Premature Optimization: One of the classic anti-patterns in software design. Optimizing code before you know it's slow. Measure first.

Micro-Optimizations: Saving 1ms in a function that's called 100 times per second. Good. Saving 1ms in a function called once per day. Not worth the complexity.

Over-Caching: Caching stale data causes more problems than slow responses. Cache selectively.

Ignoring Client Performance: Backend engineers optimize the API. Frontend engineers ignore bundle size. Both matter.

AI-Generated Code and Performance

AI code generators tend to miss obvious optimizations. They fetch the same data repeatedly instead of caching. They generate inefficient queries. They bundle everything instead of splitting code.

Bitloops helps by baking optimizations into generated code. Data fetching automatically caches. Queries are analyzed for efficiency. Bundles are split by default.

Frequently Asked Questions

How much should I optimize?

Until you hit diminishing returns. A 10% improvement in a bottleneck is worth effort. A 1% improvement in non-bottleneck code is not.

Is latency or throughput more important?

Latency affects user perception (how fast does my action complete?). Throughput affects cost (how many users can I serve?). Both matter, but for different reasons.

When should I use caching?

When you have hot data that's read much more than it changes. Products are read thousands of times, changed once. Cache them. Personalized recommendations are read and written frequently. Maybe not worth caching.

How do I optimize for slow networks?

Reduce payload size (minification, compression). Reduce requests (bundling, batching). Reduce code execution (optimize algorithms).

Should I optimize for mobile?

Yes. Mobile networks are slower and less reliable than desktop. Optimizations for mobile benefit everyone.

How do I know if my optimization worked?

Measure before and after. Real user metrics matter most. Lab metrics (synthetic tests) are useful but not always representative.

Primary Sources

Get Started with Bitloops.

Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.

curl -sSL https://bitloops.com/install.sh | bash