Skip to content
Bitloops - Git captures what changed. Bitloops captures why.
HomeAbout usDocsBlog
ResourcesEngineering Best PracticesData Contracts and System Boundaries

Data Contracts and System Boundaries

Data contracts are explicit agreements about what data looks like and what it means. They're how systems stay independent—one team can change their internals as long as the contract holds.

7 min readUpdated March 4, 2026Engineering Best Practices

A data contract is an agreement between systems about what data looks like when it crosses a boundary. System A publishes data. System B consumes it. The contract says: "data will look like this, mean this, and will be available at this time."

Without contracts, integration is fragile. System A changes something and System B breaks. Nobody catches it until production. With contracts, changes are visible. You can test compatibility before deploying.

Why This Matters

Integration failures are expensive. When systems don't agree on data format, they fail to integrate. When they fail, you lose data, break workflows, or create inconsistencies. Data contracts prevent this.

Independence is critical in microservices. System A and System B should evolve independently. They shouldn't need to coordinate every change. Clear contracts enable this independence. Team A changes their system as long as they honor the contract. Team B does the same.

Contracts catch problems early. If you define a contract and validate it, you'll catch mismatches during testing. You won't discover problems in production.

Documentation becomes unnecessary when contracts are explicit. The contract is the documentation. It's machine-readable and testable.

Syntactic vs. Semantic Contracts

Syntactic contracts define the shape of data. What fields exist? What types are they? Are they required?

{
  "user_id": "string (required)",
  "email": "string (required, email format)",
  "age": "number (optional, >= 0)",
  "created_at": "timestamp (required)"
}
JSON

This is the minimum. You know what fields exist and their types.

Semantic contracts add meaning. What does each field represent? What are the valid values? What constraints apply?

{
  "user_id": "UUID, uniquely identifies this user",
  "email": "valid email address, unique across all users",
  "age": "user's age in years, must be >= 0, null if unknown",
  "created_at": "ISO 8601 timestamp, immutable, set when user created"
}
JSON

Semantic contracts prevent misunderstandings. System A thinks email must be unique. System B doesn't. They integrate but System B allows duplicates. Data breaks downstream. A semantic contract would have made this clear.

Contract Design

Be explicit about requirements. Which fields are required? Which are optional? What happens if optional fields are missing?

User:
  required:
    - id: unique identifier
    - email: contact email
    - name: user name
  optional:
    - phone: contact phone number (null if not provided)
    - timezone: user timezone (defaults to UTC if not provided)
YAML

Define valid values. If a field is an enum, list the valid values. If it's a range, define the bounds.

Order:
  status: "enum: pending, processing, shipped, delivered, cancelled"
  priority: "enum: low, medium, high"
  quantity: "integer, >= 1, <= 10000"
YAML

Specify data types precisely. Don't just say "string." Say "UUID string in format XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX."

User:
  id: "UUID (format: 8-4-4-4-12 hex digits)"
  created_at: "ISO 8601 timestamp with timezone (e.g., 2026-03-05T14:30:00Z)"
  revenue: "decimal number with exactly 2 decimal places (e.g., 100.00)"
YAML

Define semantics clearly. What does a field really mean? If you say "status," what statuses are valid and when does each occur?

# Order Status Semantics

- **pending**: Order placed, payment not yet processed. Can transition to: processing, cancelled
- **processing**: Payment processed, preparing to ship. Can transition to: shipped, cancelled
- **shipped**: Order dispatched. Can transition to: delivered
- **delivered**: Recipient confirmed receipt. Terminal state.
- **cancelled**: Order cancelled by user or system. Terminal state.
Markdown

Schema Definition Languages

Use a formal schema language. It's machine-readable, testable, and can be auto-documented.

JSON Schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "user_id": {
      "type": "string",
      "format": "uuid"
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "age": {
      "type": "integer",
      "minimum": 0
    },
    "created_at": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["user_id", "email", "created_at"]
}
javascript

Apache Avro:

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "user_id", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "age", "type": ["null", "int"], "default": null},
    {"name": "created_at", "type": "long"}
  ]
}
avro

Protocol Buffers:

message User {
  string user_id = 1;
  string email = 2;
  optional int32 age = 3;
  int64 created_at_millis = 4;
}
protobuf

All three are machine-readable and can validate data automatically.

Consumer-Driven Contracts

In a microservices system, System A might serve ten consumers. System A doesn't know what each consumer needs. Consumer-driven contract testing lets consumers define what they expect from System A.

Pact is a tool for consumer-driven contract testing:

// Consumer's expectation (team-b/test/pact.test.js)
const PactProvider = require('@pact-foundation/pact');

describe('User Service contract', () => {
  it('returns a user by ID', async () => {
    await pactProvider
      .addInteraction({
        state: 'user 123 exists',
        uponReceiving: 'a request for user 123',
        withRequest: {
          method: 'GET',
          path: '/users/123'
        },
        willRespondWith: {
          status: 200,
          body: {
            id: '123',
            email: 'alice@example.com',
            name: 'Alice'
          }
        }
      });

    const user = await userService.getUser('123');
    expect(user.id).toBe('123');
  });
});
javascript

This contract is generated and shared with System A. System A tests against it. If System A changes the response format, the test fails immediately. The consumer doesn't discover it in production.

Contract Testing

Contracts should be tested automatically.

Schema validation: Does the data match the schema?

const Ajv = require('ajv');
const ajv = new Ajv();
const schema = require('./schemas/user.json');
const validate = ajv.compile(schema);

describe('User data contract', () => {
  it('validates correct user data', () => {
    const user = {
      user_id: '550e8400-e29b-41d4-a716-446655440000',
      email: 'alice@example.com',
      age: 30,
      created_at: '2026-03-05T14:30:00Z'
    };
    expect(validate(user)).toBe(true);
  });

  it('rejects invalid email', () => {
    const user = {
      user_id: '550e8400-e29b-41d4-a716-446655440000',
      email: 'not-an-email',
      created_at: '2026-03-05T14:30:00Z'
    };
    expect(validate(user)).toBe(false);
  });
});
javascript

Semantic validation: Does the data make semantic sense?

describe('Order data semantics', () => {
  it('validates valid status transition', () => {
    const order = {
      id: '123',
      status: 'processing',
      previous_status: 'pending'
    };
    expect(isValidStatusTransition(order)).toBe(true);
  });

  it('rejects invalid status transition', () => {
    const order = {
      id: '123',
      status: 'pending',
      previous_status: 'delivered'
    };
    expect(isValidStatusTransition(order)).toBe(false);
  });
});
javascript

System Boundaries

Define clear boundaries between systems. At the boundary, contracts apply.

Message boundaries: When System A publishes an event, what does it look like?

OrderCreated:
  fields:
    - order_id: UUID
    - customer_id: UUID
    - total_amount: decimal with 2 places
    - created_at: ISO 8601 timestamp
YAML

API boundaries: When System A exposes an endpoint, what's the contract?

GET /api/orders/{id}:
  response:
    - order_id: UUID
    - status: enum (pending, processing, shipped, delivered)
    - items: array of items
    - total: decimal with 2 places
YAML

Database boundaries: When System A exposes a database view for System B to read, what columns exist and what do they mean?

-- System B can read this view
CREATE VIEW orders_for_reporting AS
SELECT
  order_id,
  customer_id,
  total_amount,
  created_at,
  completed_at
FROM orders;
SQL

Evolving Contracts

Contracts must evolve as systems do. But evolution must be backward-compatible.

Adding fields: OK. Add new fields as optional with defaults.

# Before
User:
  required: [id, email]

# After
User:
  required: [id, email]
  optional: [phone, timezone]  # new field
YAML

Removing fields: Not OK without a major version bump.

Changing field types: Not OK without a major version bump. If you change ID from string to integer, consumers break.

Expanding enums: OK. Add new enum values.

Removing enum values: Not OK. Consumers might still send the old value.

FAQ

Should we have a schema for every message/API?

Yes. If data crosses a system boundary, define the contract. It's not just about validation—it's about documentation and communication.

How detailed should contracts be?

Detailed enough that a consumer can implement integration without asking questions. Not so detailed that the contract is longer than needed.

How do we version contracts?

Same as API versioning. Semantic versioning. Add new fields (minor), remove fields (major), change semantics (major).

What if a contract is violated in production?

Alert on it. Don't silently ignore bad data. Track violations so you can debug. Eventually work with the producer to fix the root cause.

Should consumers validate incoming data?

Yes. Defensive programming. Validate against the contract. Log violations. Alert on systematic violations.

How do we ensure producers honor contracts?

Test against contract before deploying. Use contract testing frameworks like Pact.

Primary Sources

  • Sam Newman's guide to designing microservices boundaries and data contracts. Building Microservices
  • Google's engineering practices on API design and system boundary contracts. Google Eng Practices
  • The Pragmatic Programmer's approach to designing systems with clear interfaces. Pragmatic Programmer
  • Steve McConnell's guide to designing API contracts and system integration. Code Complete
  • Robert Martin's handbook on writing code with clear system boundaries. Clean Code
  • John Ousterhout's philosophy on designing clean interfaces and system boundaries. Philosophy of Design
  • Google SRE practices for maintaining contracts between distributed systems. SRE Workbook

Get Started with Bitloops.

Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.

curl -sSL https://bitloops.com/install.sh | bash