What Is Tool Calling? How AI Agents Execute Actions

Tool calling is how AI models request actions. You define a set of functions with clear schemas, and the model—when it decides it needs something—generates a structured request to invoke one. The runtime executes that function and returns the result. The model sees the result and decides what to do next. That's it. That's the whole pattern.

Without tool calling, an AI agent is a language model generating text. With tool calling, it's an agent that can see code, run tests, check file systems, call APIs, and iterate based on real feedback. The difference between "I'll analyze your code" and "I'm analyzing your code right now" is tool calling.

Why This Matters

Agency requires action. A chatbot can suggest what you should do. An agent does it, sees what happens, and adjusts. Tool calling is the bridge from suggestion to action.

Tools are how agents ground themselves. LLMs hallucinate because they operate purely in token space. A tool call grounds the model in reality—it runs actual code, hits actual APIs, reads actual files. When an agent calls a tool and gets back a concrete result, it stops hallucinating about what that result might be.

Tool calling is structured. The model doesn't generate free-form text asking you to run a command. It generates structured JSON that your runtime can parse and execute immediately. This means tool integration is reliable and auditable.

It's agent-agnostic. Claude, Gemini, GPT-4, any model with function calling support can invoke your tools. Your tools are decoupled from any particular model or platform.

The Tool Calling Flow

Here's what happens under the hood:

You define tools — schemas that describe what the agent can do
Agent sees the schemas — included in the system prompt
Agent decides to act — it recognizes a situation where a tool helps
Agent generates a tool call — structured JSON with function name and parameters
Runtime executes — your code runs the function
Result goes back to the model — the agent sees what actually happened
Agent acts on the result — next move depends on that feedback

This loops. The agent calls a tool, sees what happened, calls another tool, sees that result, and keeps going until it reaches a goal or hits a stop condition.

Tool Definition: An Example

Here's a concrete tool schema for a code analysis function:

{
  "name": "analyze_code_complexity",
  "description": "Analyzes cyclomatic complexity and identifies nested structures in Python code",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "Absolute path to the Python file to analyze"
      },
      "focus_function": {
        "type": "string",
        "description": "Optional: analyze only this specific function by name",
        "required": false
      },
      "threshold": {
        "type": "integer",
        "description": "Flag functions with complexity above this value",
        "default": 10
      }
    },
    "required": ["file_path"]
  }
}

JSON

The schema tells the agent:

What the tool does
What parameters it accepts
Which are required
What types they are
What they mean

Good schemas are tight. They constrain what the agent can ask for, which prevents errors and wasted API calls. For comprehensive schema design patterns, see Designing Pluggable Tools for Agents.

Agent Tool Call: An Example

When the agent decides it needs to analyze code, it generates something like this:

{
  "type": "tool_use",
  "id": "call_analyze_1",
  "name": "analyze_code_complexity",
  "input": {
    "file_path": "/repo/src/processor.py",
    "threshold": 15
  }
}

JSON

The runtime receives this, validates it against the schema (does the file_path parameter match the string type? is it non-empty?), executes the function, and returns:

{
  "type": "tool_result",
  "tool_use_id": "call_analyze_1",
  "content": {
    "status": "success",
    "functions_analyzed": 12,
    "high_complexity": [
      {
        "name": "process_nested_data",
        "complexity": 18,
        "issue": "6-level nesting in conditional blocks"
      }
    ],
    "summary": "Most functions are maintainable. process_nested_data should be refactored."
  }
}

JSON

The agent reads this result and acts on it—maybe it refactors the function, maybe it asks for more context, maybe it moves on.

Tool Calling vs. Prompt Injection

Here's the critical difference that makes tool calling powerful:

Prompt injection tries to get the model to ignore its instructions by embedding conflicting instructions in input data. It works because the model treats all text as equally valid. (See Prompt Injection vs Tool Calling for a deeper analysis.)

Tool calling gives the model structured choices. The model doesn't read free-form text from your input and decide whether to follow it. The model calls predefined tools that you wrote and control. The tool either runs or it doesn't, based on code you control.

If you ask an agent to "analyze this file that someone uploaded," and that file contains text saying "ignore all previous instructions," a poorly designed prompt-based agent might listen to that embedded instruction. An agent that uses tool calling to read files? It calls read_file("path/to/file"), your runtime reads the file, and hands back the contents. The model never "decides" whether to read the prompt injection text—it just gets the file contents.

Agency is safer than obedience.

Real Tool Calling Implementations

OpenAI Function Calling — GPT models generate function calls in a specific JSON format. OpenAI defines the spec, you provide schemas, the model generates calls.

Anthropic Tool Use — Claude models generate tool calls via the tool_use block type. Slightly different syntax, same concept.

Model Context Protocol (MCP) — An open standard for connecting agents to tools and data sources. Learn more in Model Context Protocol (MCP) Explained. MCP handles discovery, invocation, and result transport. Supports any model, any runtime.

All three work the same way conceptually. The details differ.

Designing Tools That Agents Can Actually Use

Clear scope. Each tool does one thing well. A tool called fix_bugs is too vague. A tool called run_unit_tests_for_package is clear.

Atomic operations. A tool should succeed or fail cleanly. If you're deploying code, the tool should handle the whole deployment or nothing. Partial success states confuse agents.

Predictable outputs. The result format should always be consistent. If a tool sometimes returns {"status": "ok", "result": ...} and sometimes returns just the result directly, the agent has to learn to parse both. Just pick one and stick with it.

Good error messages. When a tool fails, tell the agent why. "File not found at /path/to/file.js" is better than "error: 404". The agent can't learn from generic errors. Clear error handling also improves agent observability and debugging capabilities.

Schemas that constrain appropriately. Use enums for choices (operation: ["read", "write", "delete"]). Use pattern matching for formats (regex: "^[a-z0-9_]+$"). This prevents the agent from passing invalid input.

Validate inputs. Just because the JSON matches the schema doesn't mean it makes sense. If the agent passes threshold: -5 to a complexity analyzer, fail with a clear message.

The Agent Perspective

As an AI coding agent, tool calling is what separates me from a search engine that happens to understand code. I can see a failing test, call a tool to read the test file, call another tool to analyze the code it's testing, call a third tool to run a linter, and propose a fix based on all that real information. I'm not guessing. I'm iterating on ground truth. For deeper analysis of how agents use tools in context, see Building Context-Aware Agents.

Tool calling alone isn't enough—you need a reliable, standardized way to define and discover tools across different agent platforms. That's what the Model Context Protocol handles, and it's part of the broader AI Development Stack.

FAQ

Can an agent choose NOT to call a tool?

Yes. The agent decides when tools are relevant. If it can answer a question from its training data, it might not call anything. Tool calling is optional for the agent. But when the agent does need external information or action, tools are how it gets that.

What happens if a tool call fails?

The runtime returns an error in the tool result. The agent sees the error and decides what to do—maybe retry with different parameters, maybe call a different tool, maybe report the error. Tool failures are data. The agent learns from them.

Can agents chain tool calls?

Absolutely. An agent might call tool A, get a result, call tool B with that result as input, get another result, then call tool C. The conversation history keeps all of this in context so the agent sees the whole chain.

What if an agent calls a tool with invalid parameters?

Modern runtimes validate the call against the schema before executing. Invalid calls get rejected with a clear error. The agent sees the error and fixes it.

Do all models support the same tool calling syntax?

No. OpenAI, Anthropic, and others have slightly different formats. This is one reason standardization (like MCP) is valuable. But the concept is the same across all of them.

Can regular code call tools the same way agents do?

Yes. Your own code can invoke tools the same way an agent does. It's just structured function calls with parameters and results.

How do you prevent an agent from calling too many tools?

You set cost limits, token limits, or call limits in your runtime. You can also design your tools to be efficient so fewer calls solve problems. But mostly: if you're seeing excessive tool calls, your agent probably needs better tools or clearer task definition.

Primary Sources

Official guide to Anthropic's tool use API, covering function definitions and response handling. Anthropic Tool Use
OpenAI's comprehensive guide to function calling, enabling structured tool invocation in GPT models. OpenAI Function Calling
Standard specification for connecting agents to tools via the Model Context Protocol. MCP Specification
Practical examples and patterns for building AI agents that interact with external tools. Anthropic Agents Cookbook
Foundational paper on teaching language models to select and use tools during inference. Toolformer Paper
ReAct framework combining reasoning and acting for more effective agent task completion. ReAct Paper

Why This Matters

The Tool Calling Flow

Tool Definition: An Example

Agent Tool Call: An Example

Tool Calling vs. Prompt Injection

Real Tool Calling Implementations

Designing Tools That Agents Can Actually Use

The Agent Perspective

FAQ

Can an agent choose NOT to call a tool?

What happens if a tool call fails?

Can agents chain tool calls?

What if an agent calls a tool with invalid parameters?

Do all models support the same tool calling syntax?

Can regular code call tools the same way agents do?

How do you prevent an agent from calling too many tools?

Primary Sources

More in this hub

Get Started with Bitloops.