Designing Pluggable Tools for Agents: Schema, Versioning, Composability
Agents can't read docs or infer intent; they'll call tools exactly as defined. Poor schema design makes agents waste tokens working around ambiguity. Learn how to design atomic, clear, predictable tools that agents understand and use correctly.
Designing a tool that an AI agent will call is different from designing an API for humans. Humans can read documentation, infer intent, and work around ambiguity. Agents can't. They'll call your tool exactly as defined, with whatever parameters make sense to them at that moment. If your tool is ambiguous, they'll get it wrong.
This article is about the decisions that compound. One poorly-designed tool causes agents to waste tokens trying to work around it. One tool that tries to do too much forces agents to split complex operations into multiple calls. One tool with ambiguous error messages teaches agents to avoid it. Design your tools well from the start. The standardization of tool definitions through Model Context Protocol makes good design patterns even more valuable.
What Makes a Good Agent Tool
Atomic scope. Each tool does one thing that can succeed or fail cleanly. Not "process data and deploy" (two things, coupled outcomes). Just "deploy code" or "run tests." Agents can chain atomic tools to do complex work. Tools that try to do too much are fragile.
Clear schema. Describe every parameter. Say which are required. Use type constraints and enums to narrow possibilities. A well-designed schema makes it obvious how to use the tool.
Predictable behavior. The same inputs always produce the same output. The tool doesn't have subtle state-dependent behavior or edge cases. When an agent calls it twice with the same parameters, it gets the same result (or a clear reason why it doesn't).
Informative failure. When something goes wrong, the error message tells you why. "File not found: /path/to/file" is better than "error: 404" which is better than "failed" which is worse than nothing. Agents learn from error messages. Bad errors teach them to avoid your tool.
Reasonable latency. If a tool takes 30 seconds to return, agents will have trouble using it. They'll timeout or make poor decisions while waiting. Keep tools fast. If an operation takes time, have the tool start it and return an ID the agent can check later.
Consistent result format. Always return the same structure. If sometimes you return {"status": "success", "data": {...}} and sometimes just the data, agents have to learn both patterns. Pick one. Stick to it.
Schema Design: JSON Schema Best Practices
Your tool schema is the contract between your code and the agent. Get it right and agents will use the tool correctly. Get it wrong and they'll struggle.
Required vs optional. Most parameters should be required. Make optional parameters truly optional—things an agent doesn't need to specify. If a tool almost always needs a parameter, make it required. If an agent has to guess whether something is required, your schema failed.
{
"name": "analyze_test_coverage",
"inputSchema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Absolute path to the source file to analyze"
},
"include_integration_tests": {
"type": "boolean",
"description": "Include integration tests in the analysis",
"default": false
},
"minimum_coverage_threshold": {
"type": "integer",
"description": "Flag files below this coverage percentage",
"default": 80,
"minimum": 0,
"maximum": 100
}
},
"required": ["file_path"]
}
}Here, file_path is required. The agent must provide it. The other two have defaults, so the agent can omit them.
Enums for constrained choices. If a parameter has a fixed set of valid values, use an enum. Don't let agents guess.
{
"properties": {
"output_format": {
"type": "string",
"enum": ["json", "yaml", "markdown", "plain_text"],
"description": "Format for the output"
}
}
}The agent knows exactly what's valid. No invalid calls.
Pattern matching for structured strings. If a parameter should follow a format (email, date, regex), use pattern:
{
"properties": {
"email": {
"type": "string",
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
}
}
}Descriptions that say what, not why. Agents read descriptions to understand what a parameter does. Be explicit.
{
"properties": {
"timeout_seconds": {
"type": "integer",
"description": "Number of seconds to wait before timing out",
"default": 30
}
}
}Not: "Timeout setting because operations can take a while." That's why. Say what.
Array parameters with item constraints. If a parameter is an array, specify what items it contains.
{
"properties": {
"file_paths": {
"type": "array",
"items": {
"type": "string",
"description": "Absolute path to a file"
},
"minItems": 1,
"description": "One or more file paths to analyze"
}
}
}The agent knows it can pass multiple files, each as a string path.
Objects with strict properties. If a parameter is an object, define its structure. Don't allow arbitrary properties.
{
"properties": {
"config": {
"type": "object",
"properties": {
"strict_mode": {
"type": "boolean"
},
"verbosity": {
"type": "string",
"enum": ["quiet", "normal", "verbose"]
}
},
"required": ["strict_mode"],
"additionalProperties": false
}
}
}additionalProperties: false means agents can't pass random extra fields. They have to know exactly what the tool accepts.
Versioning Strategies
Tools change. You add parameters, fix behavior, deprecate old options. How do you evolve without breaking agents that depend on you?
Version in the tool name. analyze_code_v1, analyze_code_v2. Agents that call analyze_code_v1 continue to do so. New agents can use v2. The old tool keeps working while you transition.
Downside: clients see multiple versions. They have to pick one.
Backward-compatible evolution. New parameters have defaults. Removed parameters get replaced with newer equivalents. The tool behaves the same for old calls, adds functionality for new calls.
{
"name": "compile_code",
"inputSchema": {
"properties": {
"file_path": {"type": "string"},
"optimization_level": {
"type": "string",
"enum": ["none", "basic", "aggressive"],
"default": "basic"
},
"target_version": {
"type": "string",
"description": "Optional: target language version (e.g., python3.11)",
"default": null
}
},
"required": ["file_path"]
}
}Version 1 had file_path and optimization_level. Version 2 adds target_version with a default. Agents using V1 parameters still work. Agents that discover target_version can use it. One tool, two generations of clients.
Deprecation periods. If you're removing something, support it for a grace period while agents migrate. Log a warning when deprecated features are used. Eventually remove them.
Document breaking changes. When you can't stay backward compatible, document exactly what broke and how to migrate. Make it easy for agents to understand the change.
Test against old calls. Before releasing a new version, verify that old tool calls still work (or fail gracefully with clear messages).
Composability: Tools That Chain
The best tools are designed to work together. The output of one tool becomes the input to another. Well-composed tools enable effective multi-agent collaboration and reduce the cognitive load on orchestration systems.
Consistent identifiers. If one tool returns an object_id, another tool should accept that same object_id to operate on it. Don't make agents translate between formats.
// Tool 1: List files
{
"name": "list_directory",
"returns": {
"files": [
{
"file_id": "f_12345",
"name": "main.py"
}
]
}
}
// Tool 2: Analyze file (takes the same file_id)
{
"name": "analyze_file",
"inputSchema": {
"properties": {
"file_id": {
"type": "string",
"description": "ID returned by list_directory"
}
}
}
}An agent gets file_id from tool 1, passes it to tool 2. No translation.
Output formats that match input formats. If a tool returns an array of objects with a name field, another tool that filters by name should accept that same format.
Partial operations. Some tools should support returning partial results or checking progress. If an agent starts a long operation, it should be able to check status without re-starting.
{
"name": "run_test_suite",
"inputSchema": {
"properties": {
"operation_id": {
"type": "string",
"description": "Optional: check status of a running operation",
"default": null
}
}
}
}Call it once to start, call it again with the operation_id to check progress.
Complementary tools. If you have a read_file tool, have a write_file tool. If you have start_deployment, have check_deployment_status. Tools are more useful in pairs.
Anti-Patterns: What NOT to Do
Tools that do too much. A tool called manage_infrastructure that handles provisioning, updating, tearing down, and monitoring. It has 20 parameters. It fails in ways that depend on what you were trying to do. Agents avoid it. Split it into: provision_infrastructure, update_infrastructure, teardown_infrastructure, check_infrastructure_status. Each tool does one thing.
Ambiguous parameters. A config parameter that accepts arbitrary JSON. Agents don't know what keys are valid. They guess, fail, retry differently. A good tool defines exactly what config should contain.
Silent failures. A tool that succeeds but doesn't actually do anything. No error. No indication something went wrong. The agent thinks the operation worked and moves on. Bad. Always be explicit about success or failure.
Side effects without documentation. A tool that reads a file also deletes it, or caches results that affect future calls. Document these clearly. Better: don't have side effects that agents don't expect.
Result structures that vary by outcome. Sometimes return {"status": "success", "data": {...}}, sometimes just the data, sometimes {"error": "..."}. Agents have to handle three formats. Pick one structure and use it always. Use explicit status fields.
{
"status": "success",
"data": {...}
}
{
"status": "error",
"error": "File not found",
"path": "/path/to/file"
}
{
"status": "timeout",
"elapsed_seconds": 30
}Same structure. Different status values.
Parameters with subtle dependencies. A tool where parameter B is only valid if parameter A is set to a specific value, and agents have to figure this out through trial and error. Document these explicitly. Better: use enums and conditionals to make dependencies clear.
No validation. Accepting any input and failing at execution time. Validate parameters against the schema before executing. Return clear validation errors. Let agents fix the call instead of debugging your tool.
Example: A Well-Designed Tool
Here's a real example of a tool for code analysis that follows good design principles:
{
"name": "analyze_code_quality",
"description": "Analyze a Python file for code quality metrics: complexity, style issues, test coverage, type hints",
"inputSchema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Absolute path to the Python file to analyze"
},
"check_types": {
"type": "boolean",
"description": "Run type checking (requires type hints)",
"default": true
},
"complexity_threshold": {
"type": "integer",
"description": "Cyclomatic complexity threshold; functions above this are flagged",
"default": 10,
"minimum": 1,
"maximum": 50
},
"include_style_checks": {
"type": "boolean",
"description": "Run PEP 8 style checks",
"default": true
}
},
"required": ["file_path"],
"additionalProperties": false
}
}This tool:
- Has a clear, single purpose: analyze code quality
- Requires one parameter (file_path), makes others optional
- Uses enums/defaults for choices
- Sets min/max constraints on numeric parameters
- Disallows unexpected parameters
- Describes what each parameter does
The agent knows exactly how to use it.
And the result:
{
"status": "success",
"file_path": "/repo/src/processor.py",
"metrics": {
"total_functions": 12,
"average_complexity": 6.2,
"functions_too_complex": [
{
"name": "process_nested_data",
"complexity": 18,
"recommendation": "Break into smaller functions"
}
]
},
"type_hints": {
"coverage": 85,
"missing_hints": [
{"function": "process_nested_data", "line": 42}
]
},
"style_issues": 3,
"issues": [
{"line": 15, "issue": "Unused variable 'temp_var'"}
]
}Consistent structure. Clear results. The agent can parse this and understand exactly what happened.
AI-Native Perspective on Tool Design
As an agent, I use tools constantly. When a tool has a good schema and predictable behavior, I call it confidently. When a tool is ambiguous or has surprising side effects, I waste tokens second-guessing myself. I'll prefer well-designed tools and avoid poorly-designed ones. If you're building tools for agents, invest in design. It pays off immediately in how effectively agents can use them. This is especially important in multi-agent systems where tool reuse across teams multiplies the value of good design.
FAQ
Should tools always be idempotent (same result on repeated calls)?
Ideally yes. If an agent calls a tool twice with the same parameters and gets different results, it's confusing. But some operations (like generating random values) can't be idempotent. Document non-idempotent tools clearly.
How specific should descriptions be?
Specific enough that an agent understands exactly what the tool does without additional context. A description of "Process files" is too vague. "Read Python files and return lines matching a regex pattern" is clear. Test your descriptions: would an agent understand the tool's purpose from this alone?
Can agents understand complex nested schemas?
Yes, but they struggle with very deep nesting or complex conditional structures. Keep schemas as flat and simple as possible. If your schema is hard to describe in a sentence, simplify it.
Should I have multiple versions of a tool or maintain backward compatibility?
Prefer backward compatibility if possible (new parameters with defaults, renamed old parameters gracefully). Only version if you're making breaking changes. Keep old versions around for a deprecation period, then remove them.
How do agents handle tools they don't understand?
They either avoid calling them (if they're unsure) or call them and learn from the response. Good schema documentation helps agents use tools confidently. Poor documentation means agents try and fail, wasting tokens.
What if a tool has too many optional parameters?
Agents will get confused about what's truly optional. If you have many options, consider: can you split the tool? Can you use sensible defaults? Can you use separate tools for different use cases?
Should tool output be human-readable or machine-readable?
Machine-readable. Agents parse structured output. If you're returning prose descriptions, format them so agents can extract key facts. Consistency matters more than readability for agents.
Primary Sources
- Standard specification for defining structured schemas used in tool definition and validation. JSON Schema
- Anthropic's best practices guide for designing tools that work effectively with Claude models. Tool Use Best Practices
- OpenAI's comprehensive guide to function calling for structured tool invocation. OpenAI Function Calling
- RESTful API design principles for building clean, intuitive agent tool interfaces. REST API Design
- Foundational paper on teaching language models to select and use tools during inference. Toolformer Paper
- ReAct framework combining reasoning and acting for enhanced agent decision-making. ReAct Paper
More in this hub
Designing Pluggable Tools for Agents: Schema, Versioning, Composability
3 / 10Previous
Article 2
Model Context Protocol (MCP) Explained: The Standard for Agent Tool Integration
Next
Article 4
Agent Orchestration Architectures: Single-Agent vs Multi-Agent Patterns
Also in this hub
Get Started with Bitloops.
Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.
curl -sSL https://bitloops.com/install.sh | bash