Draft Commits Explained

Definition

A Draft Commit is a temporary, queryable checkpoint captured in real time as an AI coding agent works. Unlike a git commit (which is permanent and represents finalized, reviewed code), a Draft Commit represents work-in-progress: the live record of what the agent did, why it did it, what changed and what didn't, which model was used, what reasoning the agent applied, and which alternatives it considered. Draft Commits are stored in the Memory Layer and survive session boundaries, enabling future sessions to understand what happened in past AI sessions without relying on git history or human memory.

Why It Matters

Imagine this scenario: Your team used an AI agent to refactor a critical module three days ago. The refactoring went well—tests passed, the code reviewed cleanly, it shipped. Today, you notice a subtle performance regression in production: queries that used to run in 50ms now run in 75ms.

You pull up the git history of the module. The commit message says: "Refactor for clarity and maintainability." That's accurate but useless. It doesn't tell you how the refactoring changed query patterns. Did the agent:

Restructure the data access layer?
Change how caching was handled?
Modify the indexes being used?
Rewrite the query logic?

Without that information, you're flying blind. You could spend an hour doing a detailed code diff and tracing the problem. Or—if Draft Commits were available—you could query the Memory Layer: "Show me what the agent changed in the data access layer during that session," and the answer would be immediately visible, along with the agent's reasoning about the changes.

That's the core value of Draft Commits: they make AI reasoning transparent and queryable in real time, not weeks later when something breaks. When you commit code, Draft Commits are transformed into Committed Checkpoints that become permanent records.

What a Draft Commit Captures

A Draft Commit is comprehensive. Here's what's typically included:

Full Conversation:

The complete exchange between the user and the agent
User prompts, agent responses, follow-up questions
Context provided (file snippets, requirements, constraints)

Code Changes:

What code was generated
What code was edited (and what was the before/after)
What code was considered but not applied
Which files were touched and which weren't

Model and Configuration:

Which model was used (GPT-4, Claude, etc.)
Temperature and other inference parameters
Which version of the model
When the session occurred

Agent's Reasoning:

What the agent understood about the problem
What constraints it identified
What alternatives it considered and why
Which option it chose and why it was the best fit
What trade-offs were accepted

Work-in-Progress Artifacts:

Intermediate code generations that didn't make it into the final version
Rejected approaches and the reasons for rejection
Questions the agent asked (indicating uncertainty or discovery)
Validation checks the agent performed

Outcomes and Measurements:

Did tests pass? Which tests?
What metrics changed (if any were measured)?
What side effects occurred (lint warnings, type errors, etc.)?
Did the agent iterate or get it right on the first try?

Related Context:

Links to previous Draft Commits on the same codebase
Committed Checkpoints that this Draft Commit might lead to
Related tasks or issues

All of this data is structured and queryable. You're not just storing a raw transcript; you're storing a rich record that can be searched, filtered, and analyzed.

When Draft Commits Are Created

Draft Commits are created automatically during AI sessions—there's no extra work for humans. Here's the lifecycle:

Session Start: When an AI agent begins working (prompted by a human), the session is recorded. The agent's initial understanding of the task is captured.

Continuous Checkpointing: As the agent works—iterating on code, trying different approaches, asking clarifying questions—the Memory Layer continuously captures checkpoints. These aren't final versions; they're in-progress states. If the agent generates code, then reviews it, then revises it, each stage might be a checkpoint.

Session End: When the agent finishes (the human closes the session or the agent says it's done), the final Draft Commit is recorded. This represents the complete session: everything the agent tried, learned, and produced.

Promotion to Committed Checkpoint: If the code from that Draft Commit gets merged into git, the Draft Commit can be promoted to a Committed Checkpoint—a permanent record tied to the git commit. But until then, it's just a Draft.

The beauty is that this happens automatically. There's no "save intent" step, no "document your reasoning" burden on the human. The system captures it.

Draft Commits vs. Git Commits: Key Differences

It's critical to understand that Draft Commits and git commits serve different purposes and have different properties.

Property	Git Commit	Draft Commit
Permanence	Permanent (part of the repo history)	Temporary (survives session end but can be deleted)
What's Captured	Final code state	Full session (reasoning, iterations, alternatives)
When Created	After review and merge decision	Continuously during work
Queryability	Queryable via git (by author, message, date)	Queryable via Memory Layer (by constraint, trade-off, artifact, reasoning)
Audience	All repo contributors, external viewers	AI agents and authorized team members
Scope	What changed in the codebase	Why the change was made and what alternatives were considered
Versioning	Cryptographic hash, immutable	Timestamped, can be superseded
Integration	Central to development workflow	Peripheral; doesn't change how you commit code

Example: A git commit might say:

commit a3f7e2c
Author: AI Agent <agent@bitloops.io>
Date: 2026-03-04

  Optimize query performance

  Changed join strategy to use hash join instead of
  nested loop for customer_order queries.

HTML

The corresponding Draft Commit would capture:

{
  "session_id": "sess_abc123",
  "timestamp": "2026-03-04T14:32:00Z",
  "task": "Optimize customer_order queries. Currently using nested loop, seeing 200ms latency.",
  "constraints": [
    "Schema can't change (legacy constraint)",
    "Must support backward compatibility",
    "No external dependencies allowed"
  ],
  "alternatives_considered": [
    {
      "name": "Index optimization",
      "rejected_because": "Indices already optimal; checked execution plans"
    },
    {
      "name": "Hash join",
      "selected": true,
      "rationale": "O(n) vs O(n²) nested loop; backward compatible"
    },
    {
      "name": "Query rewrite (move to async)",
      "rejected_because": "Doesn't meet real-time requirement"
    }
  ],
  "code_changes": {
    "files_modified": ["src/db/queries.py"],
    "changes": [...]
  },
  "outcomes": {
    "latency_before": "200ms",
    "latency_after": "45ms",
    "test_results": "all_pass",
    "compatibility_verified": true
  }
}

JSON

The git commit tells you what changed. The Draft Commit tells you why it changed and what else could have been done.

Practical Use Case: Debugging Agent Behavior

Here's a concrete scenario where Draft Commits are invaluable.

The Situation: Your agent generated code that passes all tests in CI but fails mysteriously in production. It involves a caching layer. The code review looked fine. But something's wrong.

Without Draft Commits: You're stuck. You have the final code, and you have git history, but you don't have a record of what the agent was thinking. Did it:

Understand that the caching logic needed to be invalidated on certain events?
Know about the eventual consistency window for the data store?
Consider whether the cache key strategy would collide under load?

You have to spend time analyzing the code and imagining what the agent might have considered. You might guess wrong. You might miss the actual issue.

With Draft Commits: You query the Memory Layer: "Show me the Draft Commit for the caching layer implementation."

You immediately see:

The agent understood the task: "Implement caching with 5-minute TTL"
The constraint it identified: "Data freshness requirement: eventual consistency OK up to 5 minutes"
The alternative it rejected: "Key-by-user strategy (would consume too much memory)"
The approach it chose: "Key-by-filter-combination strategy (statistically more cache hits)"
The assumption it made: "Invalidation happens via event listening on database updates"

Now you realize the issue: in production, the event listener isn't firing reliably under load. The agent didn't realize that the event queue was dropping events under high concurrency. The agent's code is correct given the assumption; the assumption is what's wrong.

With this information, you can either:

Fix the event queue to be more reliable (so the assumption holds).
Add explicit invalidation as a fallback (so the code doesn't depend solely on events).
Reduce the TTL (so stale data doesn't accumulate).

Without the Draft Commit, you might have gone straight to blaming the agent's code or wasting time on deeper investigation. The Draft Commit makes the agent's reasoning transparent, so you can evaluate whether the reasoning was sound and where reality deviated from assumptions.

A Complete Draft Commit Walkthrough

Let's trace a realistic example from start to finish.

Task: "Add a user preference for notification frequency. Users should be able to choose between 'instant', 'hourly digest', and 'daily digest'."

Session Start (2026-03-04, 09:00 AM): The agent reads the task and the codebase context. It generates an initial understanding:

New data model needed: UserNotificationPreference
New UI controls needed: preference form
Logic needed: filter notifications based on preference at send time

Initial Code Generation (09:05): The agent generates code for a UserNotificationPreference model. It creates a migration, adds the field to the database schema. The code looks good.

But then the agent pauses. It asks: "I noticed there's an existing NotificationScheduler class. Should the filtering logic be there or in a new service?" This question is captured in the Draft Commit. It indicates the agent discovered a design choice and is uncertain.

User Clarification (09:08): The user responds: "Keep the scheduler as-is for now. We can refactor later if needed. Implement filtering in a new PreferenceFilter service."

Revised Code Generation (09:10): The agent generates:

The migration
The model
A new PreferenceFilter service that wraps the scheduler
Updates to the notification send logic to apply the filter

Testing (09:15): The agent writes tests for the new service. All tests pass. But the agent also asks: "What about historical notifications? Should we apply preferences retroactively?" Another design question captured.

User Response (09:17): "No retroactive filtering. Fresh preference applies going forward. Document that assumption."

Final Version (09:20): The agent adds a docstring clarifying the assumption and generates the final code.

Session End: The complete Draft Commit now contains:

Full conversation: All 8 exchanges between user and agent
Code: 4 files modified, 320 lines added
Reasoning:
- Constraint: "Keep NotificationScheduler unchanged for now"
- Design choice: "New PreferenceFilter service wraps scheduler"
- Assumption: "Preferences apply only to future notifications, not historical"
Alternatives:
- Considered: filtering inside scheduler (rejected to preserve abstraction)
- Considered: retroactive filtering (rejected per requirements)
Outcomes:
- 14 tests written, all passed
- Code coverage: 92% (new code)
- Type checking: passed

Three months later... A new task arrives: "Users want to change their notification preferences multiple times a day. Add audit logging for preference changes."

The agent queries the Memory Layer for the original preference implementation and immediately sees:

The architecture (separate PreferenceFilter service)
The assumption (fresh preference, no retroactive changes)
The existing tests

The agent can now extend the implementation intelligently, without rediscovering the architecture or the assumptions.

Draft Commits and Team Coordination

Draft Commits enable a new kind of asynchronous collaboration.

Scenario: Alice's agent is working on a refactor. Bob's agent is working on a feature in the same area. They're not coordinating in real time, but they're working on overlapping code.

Without Draft Commits: Alice and Bob's agents might generate conflicting code, or one might undo the other's work, or they might make incompatible design decisions. The conflict only surfaces when they merge, and by then it's expensive to reconcile.

With Draft Commits: Bob's agent, before generating code, queries the Memory Layer: "Show me recent Draft Commits for the auth module." It sees Alice's Draft Commit from earlier today, including her reasoning about the architectural changes she's making. Bob's agent can either:

Coordinate with Alice's changes (building on top of them).
Flag a potential conflict for human review.
Propose an alternative approach that doesn't interfere.

This is particularly powerful when agents are working across time zones or when human coordination is slow. The Draft Commits become a shared record of intent that multiple agents can reference.

How Draft Commits Are Stored

Draft Commits aren't stored in git (they'd bloat the repo). They're stored in the Memory Layer: a local SQLite database plus a vector index (HNSW) for semantic search.

SQLite Storage: Structured fields (session ID, timestamp, model used, constraints, alternatives, outcomes) are stored in relational tables. This allows fast filtering and joins. "Show me all Draft Commits where the constraint is 'schema can't change'" is a simple query.

Vector Index: The full text of conversations and reasoning is converted to embeddings and indexed in HNSW (Hierarchical Navigable Small World). This enables semantic search. "Show me past decisions about caching strategies" can retrieve relevant Draft Commits even if the language is different.

Attachment Storage: Code snippets, diffs, and other artifacts are stored separately and linked from the main Draft Commit record. This keeps the main record compact while preserving the full information.

Lifecycle Management: Draft Commits can be:

Active: Still relevant, frequently accessed
Archived: Relevant but old; moved to slow storage
Superseded: A later Draft Commit made this one obsolete; kept for history but not actively queried
Deleted: Temporary work that's no longer relevant; can be pruned

Team governance policies determine which Draft Commits are kept and for how long. Some organizations keep them forever (rich historical record, but large storage cost). Others keep them for a rolling 12 months (good balance). Some keep them only until the code is merged into git (minimizes storage, loses some value).

When Draft Commits Become Committed Checkpoints

A Draft Commit represents work-in-progress. When that work is reviewed and merged into git, the Draft Commit can be promoted to a Committed Checkpoint—a permanent record tied to the git commit.

A Committed Checkpoint includes:

The full Draft Commit information (all reasoning, alternatives, outcomes)
A hash linking it to the git commit
Review and approval information (who approved it, any review comments)
Links to related issues or requirements

Committed Checkpoints are permanent—they're part of the codebase history, just as queryable as Draft Commits but with stronger durability guarantees. They're the meeting point between AI memory and git history. Git says what changed. The Committed Checkpoint says why. This is critical for reviewing AI-generated diffs with context.

The Distinction: AI Activity Tracking vs. Structural Context

It's worth clarifying what Draft Commits are not.

Draft Commits are not structural context. They don't compute or store:

The current codebase structure (that's computed on-demand via AST)
The data model or architecture (that's derived from the code)
The current state of files (that's in the git working directory)

Draft Commits are AI activity tracking. They record what the AI did, why it did it, and what reasoning led to the decision. Structural context (like AST analysis or semantic understanding of the codebase) is computed fresh for each session to ensure accuracy. AI activity tracking is historical.

The combination of both—structural context (fresh, always current) plus AI activity tracking (historical, accumulated)—gives agents the best of both worlds: accurate understanding of what's there now, plus historical understanding of how it got there and why.

FAQ

Why not just save the conversation transcript?

Transcripts are useful for understanding what happened, but they're not queryable in structured ways. If your transcript is a natural language conversation, searching it requires parsing and semantic understanding, which is slow. Draft Commits extract the key information (constraints, alternatives, outcomes) into structured fields, making them queryable and indexable. You keep the transcript too (for reference), but the structured record is what enables efficient retrieval and analysis.

Do Draft Commits slow down development?

Not meaningfully. Draft Commits are captured automatically in the background; they don't require the agent to pause or the human to do extra work. The overhead is small (maybe 1-2% of session time), and it's usually invisible because it's asynchronous. The time you save in future sessions (by not having to re-explain constraints or re-discover design choices) far outweighs the small upfront cost.

Can agents hallucinate or lie in Draft Commits?

Agents can be wrong or misunderstand constraints, yes. But that's exactly why Draft Commits are valuable. If the agent misunderstood something and it caused a bug later, the Draft Commit reveals the misunderstanding. You can see: "The agent thought X was true, but X is actually false. That's why it generated code that didn't account for Y." Having that visibility into the agent's assumptions is much better than not having it.

What's the difference between a Draft Commit and a backup or snapshot?

A backup or snapshot is the raw state of files at a point in time. A Draft Commit is a rich record of why the files are in that state. It includes reasoning, alternatives, constraints, and outcomes. Backups are useful for recovery; Draft Commits are useful for understanding and learning.

How do you prevent Draft Commits from becoming overwhelming?

Governance. Most teams set policies like "keep Draft Commits for 90 days" or "keep Draft Commits for promoted code permanently, archive others after 30 days." Some teams keep comprehensive records for critical components and lighter records for non-critical code. The vector index helps too—if you keep a large number of Draft Commits, semantic search makes them still discoverable without having to list them all.

Can Draft Commits be shared across different codebases or teams?

Yes, and that's one of the advanced use cases. If Team A discovers a pattern or constraint while working on Component X, and Team B is working on a similar component in a different repo, Team B can query Team A's Draft Commits and benefit from that learning. This requires governance (Who can access which Draft Commits? What data is private vs. shared?) but it's technically straightforward.

What happens to Draft Commits when code is deleted or refactored significantly?

The Draft Commits remain. They're historical records. If code is deleted, the Draft Commit that created it is still there, documenting why it was created and what it did. If code is refactored, a new Draft Commit documents the refactoring, and the old Draft Commit remains as history. This is actually valuable—it preserves the decision lineage.

Can a human manually create a Draft Commit?

Not in the strict sense. Draft Commits are automatically generated by AI agent activity. But humans can add annotations to Draft Commits after the fact: "This decision was sound" or "We later learned this assumption was wrong." These annotations enrich the Draft Commit without changing the underlying record.

Primary Sources

Structured format for capturing architectural decisions and their rationale in version control. Architecture Decision Records
Comprehensive guide to documenting software architecture for long-term understanding. Software Architecture in Practice
Hierarchical approximate nearest-neighbor algorithm for indexing semantic embeddings. HNSW
Facebook AI Research library for fast similarity search in high dimensions. FAISS
Self-contained database engine for persisting commit metadata and reasoning traces. SQLite
Vector database with efficient indexing for retrieval of similar code changes. Qdrant