Testing Strategies for Large Systems
Test smart: many unit tests, fewer integration tests, fewer end-to-end tests. Use property-based testing and mutation testing to catch bugs that regular tests miss. The wrong test mix wastes effort without catching real problems.
Testing at scale is a different beast than testing a small application. When you have a million lines of code with dozens of services, a failed test might be a real bug or a flaky test. Tests that ran in seconds might take minutes. Dependencies become impossible to mock. Coverage metrics become meaningless if tests aren't actually catching bugs.
The testing pyramid is your foundation: many fast unit tests at the base, fewer integration tests in the middle, and a handful of critical end-to-end tests at the top. This structure inverts how many teams actually test—they skip units, write integration tests, and have a massive e2e suite that takes forever. Then they wonder why bugs slip through and deployments are slow.
Why This Matters
Tests are your insurance policy against regression. They're not about achieving a coverage percentage. They're about catching mistakes before they hit production. A system with 50% coverage and the right tests catches more bugs than one with 90% coverage and bad tests.
Speed is critical. If your test suite takes an hour to run, developers stop running it locally. They run a subset, miss problems, and break the build. If your test suite runs in seconds, developers run it constantly. Bugs get caught during development, not in code review or production.
Maintenance burden grows with codebase size. A test written for a simple function is easy to maintain. A test that spans ten services and mocks everything is fragile. Change one service and multiple tests break. You're maintaining tests, not running them.
Isolation prevents cascading failures. When tests are properly isolated, one broken test tells you exactly what's wrong. When tests have hidden dependencies, one failure can mask ten problems. Isolation is a property you design for from the start, not something you add later. This is why SOLID principles matter for testability.
The Testing Pyramid
Unit tests are fast, focused, and numerous. They test a single function or class in isolation. Dependencies are mocked. Database calls are replaced. Network calls don't happen. A unit test runs in milliseconds.
Example unit test:
describe('calculateDiscount', () => {
it('applies percentage discount correctly', () => {
const discount = calculateDiscount(100, 0.1);
expect(discount).toBe(10);
});
it('applies maximum discount cap', () => {
const discount = calculateDiscount(100, 0.9); // 90% requested
expect(discount).toBe(20); // but max is 20%
});
});Good unit tests are specific about what they test. They test the happy path, the edge cases, and the error cases. A function that handles three scenarios should have three tests minimum.
Integration tests exercise multiple units working together. A service talks to a real database. An API endpoint processes a request. A payment processor integrates with the payment gateway. These tests are slower but catch integration bugs—things that work in isolation but fail together.
Example integration test:
describe('UserService integration', () => {
it('creates user and indexes in search', async () => {
const user = await userService.createUser({ name: 'Alice' });
const found = await searchService.findUser('Alice');
expect(found).toBe(user.id);
});
});Integration tests use real implementations but test against a test database. They don't need to mock everything—in fact, mocking too much defeats the purpose.
End-to-end tests exercise the entire system. A user interacts with your UI. Data flows through multiple services. The payment gateway processes a real (test) transaction. E2E tests are slow and brittle, so you keep them minimal. They test critical user journeys that would be catastrophic if they broke.
Example e2e flow:
- User navigates to checkout
- Enters credit card
- Completes purchase
- Receives confirmation email
- Order appears in their account
That's one e2e test. You might have five to ten total. Not fifty.
Test Isolation Strategies
Tests fail when dependencies aren't isolated. The classic problem: a test modifies shared state that other tests depend on.
Setup and teardown ensure each test starts clean. Before each test, create fresh data. After each test, clean it up.
describe('OrderService', () => {
let database;
beforeEach(async () => {
database = await createTestDatabase();
});
afterEach(async () => {
await database.clear();
});
it('creates orders', async () => {
const order = await orderService.create({ items: [] });
expect(order.id).toBeDefined();
});
});Mocking replaces external dependencies with test doubles. A mock doesn't hit the database or call the API. It returns test data you control.
const mockPaymentGateway = {
charge: jest.fn().mockResolvedValue({ success: true, id: 'txn_123' })
};
const service = new CheckoutService(mockPaymentGateway);
const result = await service.checkout({ amount: 100 });
expect(mockPaymentGateway.charge).toHaveBeenCalledWith(100);Use mocks for external systems (APIs, payment gateways, email services). Don't mock everything—mock only the things that make the test slow or unreliable.
Fakes are fully functional implementations used only for testing. A fake database that lives in memory. A fake email service that stores emails in a list instead of sending them. Fakes are better than mocks when you need realistic behavior.
class FakeEmailService {
constructor() {
this.sent = [];
}
send(email) {
this.sent.push(email);
return Promise.resolve({ id: 'email_123' });
}
}Test containers let you run real services in isolated containers for integration tests. Start a real PostgreSQL instance for your test, destroy it when the test finishes. Tools like testcontainers make this practical.
const postgres = new PostgresContainer();
await postgres.start();
const db = new Database(postgres.getConnectionUri());
// test here
await postgres.stop();Dealing with External Dependencies
External dependencies—third-party APIs, payment processors, email services—are tricky. You can't make real calls in tests. You can't wait for responses. You need isolation.
Spy on HTTP calls to verify requests without making real requests. A library like nock intercepts HTTP calls and returns fake responses.
nock('https://api.payment.com')
.post('/charge')
.reply(200, { success: true, id: 'txn_123' });
const result = await chargeCard('4111111111111111', 100);
expect(result.success).toBe(true);Use contracts to formalize what you expect from external services. Consumer-driven contract testing (using Pact) lets you generate test data from your expectations and share it with the service provider.
describe('PaymentAPI contract', () => {
it('charges card successfully', async () => {
await expect(paymentAPI.charge({
amount: 100,
card: '4111111111111111'
})).resolves.toEqual({
success: true,
transactionId: expect.any(String)
});
});
});Property-Based Testing
Property-based testing generates test inputs automatically and checks that properties hold across all inputs. Instead of writing ten test cases, you define a property and the framework generates hundreds of test cases.
Example property: "sorting a list should produce a list of the same length with the same elements."
fc.property(
fc.array(fc.integer()),
(input) => {
const sorted = quickSort(input);
expect(sorted.length).toBe(input.length);
expect(sorted).toEqual(jasmine.arrayContaining(input));
}
);The framework generates random arrays, sorts them, and checks the property. If it fails on any input, it shrinks the input to the minimal failing case.
Property-based testing finds edge cases humans miss. It's particularly useful for algorithms and mathematical operations.
Mutation Testing
Mutation testing verifies that your tests actually catch bugs. It modifies your code (mutates it) and checks whether your tests fail. If a test passes despite a mutation, your test isn't catching that bug.
Example: Your code increments a counter. A mutation changes counter++ to counter--. If your tests pass, they're not actually checking the counter value.
// Original code
counter++;
// Mutation 1
counter--; // tests should catch this
// Mutation 2
// do nothing // tests should catch this tooA good test suite kills most mutations. If your mutation score is 40%, your tests are leaving gaps.
Tools like Stryker generate mutations and run your tests:
npx stryker runOutput shows which mutations survived, revealing test gaps.
Performance Testing
Performance tests catch regressions in speed. They're not unit or integration tests—they're separate and run less frequently.
describe('performance', () => {
it('queries 100k records within 500ms', async () => {
const start = Date.now();
const results = await database.query('SELECT * FROM users LIMIT 100000');
const duration = Date.now() - start;
expect(duration).toBeLessThan(500);
});
});Performance tests run against realistic data sizes. A query that takes 10ms on 1000 records might take seconds on a million.
Practical Strategies for Large Codebases
Test in layers. Run unit tests on every commit. Run integration tests in pre-merge CI. Run e2e tests nightly. This catches most bugs quickly while keeping developer feedback fast.
Use test sharding. Split your test suite across machines. If you have 10,000 unit tests, run them in parallel across ten machines. Feedback stays under five minutes.
Keep e2e tests minimal. Pick your most critical user journeys. Test those. Don't test every button click through the UI.
Mock at boundaries. Boundaries are where your code meets external systems. Mock there. Don't mock in the middle of your business logic.
Maintain test quality. Review tests like you review code. Flaky tests are worse than no tests. Track test failures and fix flaky ones immediately.
FAQ
What's a good code coverage percentage?
There's no universal answer. 80% is reasonable for most teams. The important metric is whether your tests catch bugs. A codebase with 50% coverage and good tests beats one with 90% coverage and bad tests.
How do we deal with flaky tests?
Flaky tests fail intermittently, usually due to timing issues or external dependencies. Fix them immediately—they're worse than no test. If a test depends on timing, make it deterministic. If it depends on an external service, mock it.
Should we test private methods?
No. Test the public interface. If you find yourself testing private methods, the method probably belongs in a separate class.
How do we test asynchronous code?
Return promises from tests. The test framework waits for the promise to resolve.
it('fetches data', () => {
return service.fetchData().then(data => {
expect(data).toBeDefined();
});
});Or use async/await:
it('fetches data', async () => {
const data = await service.fetchData();
expect(data).toBeDefined();
});What's the difference between mocking and spying?
A mock replaces a function entirely with a test double. A spy wraps a real function and tracks calls but lets it execute.
const mock = jest.fn().mockReturnValue(42);
const spy = jest.spyOn(Math, 'floor');How often should we refactor tests?
As often as you refactor code. Tests are code. When the codebase changes, tests change too. Outdated tests are misleading.
Primary Sources
- Robert Martin's handbook on writing testable, clean code and test strategies. Clean Code
- Google's testing practices and guidelines for large-scale systems. Google Eng Practices
- The Pragmatic Programmer's approach to testing strategies and quality assurance. Pragmatic Programmer
- Steve McConnell's comprehensive guide to software testing and quality assurance. Code Complete
- John Ousterhout's philosophy on designing testable, modular systems. Philosophy of Design
- Google SRE practices for testing and reliability in production systems. SRE Workbook
More in this hub
Testing Strategies for Large Systems
3 / 10Get Started with Bitloops.
Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.
curl -sSL https://bitloops.com/install.sh | bash