Skip to content
Bitloops - Git captures what changed. Bitloops captures why.
HomeAbout usDocsBlog
ResourcesContext Engineering for AI Coding AgentsStructural Context Using AST Parsing: How Agents See Code Structure

Structural Context Using AST Parsing: How Agents See Code Structure

Grep and embeddings make educated guesses. AST parsing gives real answers. It tells agents exactly where functions are called, what imports what, where scopes begin and end. That precision is what separates safe refactoring from blind guessing.

11 min readUpdated March 4, 2026Context Engineering for AI Coding Agents

What Structural Context Actually Is

Structural context is the precise, queryable information about how code is connected. It's not text search. It's not semantic similarity. It's the actual dependency graph, the call chain, the scope hierarchy, the type system — the real structure of your code.

When an agent has access to structural context, it doesn't guess where a function is called from. It asks the context tool: "give me all the call sites for this function". The tool parses the code, builds the actual call graph, and returns the exact list.

This is what separates reliable code changes from lucky ones.

Why ASTs Matter (and Text Search Doesn't)

An Abstract Syntax Tree (AST) is how compilers understand code. It's not the text of the code — it's the structured representation of what the code actually means. When you parse const x = doSomething() into an AST, you're not storing a string. You're storing a tree that says: "this is a variable declaration, the name is 'x', the initializer is a function call to 'doSomething'".

That structure is everything.

Here's why grep fails where AST parsing wins:

// This is the real code structure
import { validatePassword } from './password-utils';

export class UserService {
  create(user) {
    const isValid = validatePassword(user.password);
    return isValid ? user : null;
  }
}
protobuf

If you grep for validatePassword, you find the import and the call. Great. But if you have this:

const validators = {
  validatePassword: require('./password-utils').validatePassword
};

export function createUser(user) {
  const isValid = validators['validatePassword'](user.password);
  return isValid ? user : null;
}
javascript

Grep finds the require statement (maybe), but it might miss the actual call because it's through a property lookup. An embedding search might find the related code but has no idea if it's actually called. An AST parser sees the exact structure: there's a require, there's a property access, there's a function call — it knows the call happens through the validators object.

The difference is precision. Text search is probabilistic. AST parsing is deterministic.

What Structural Context Includes

Structural context isn't just one thing. It's a collection of precise, queryable information:

Symbol Definitions and Exports

Where is each symbol defined? Is it exported? What's its scope? The agent doesn't have to guess based on naming conventions. It queries the context tool: "what does UserService export?" The tool returns the exact list of exported members, their types, their signatures.

Dependency Graphs

What files import what other files? What modules depend on what modules? This is the import/require graph, not inferred from heuristics but computed from the actual code. Change UserService and the context tool tells you exactly which files need updates.

Call Graphs

Where is each function called from? Not approximately — exactly. The context tool builds the call graph from the AST and can answer: "where is validatePassword() called?" with precision.

Scope Hierarchies

Is a variable local, is it a parameter, is it a closure variable, is it global? The agent doesn't have to guess based on how it looks. It queries the scope information: "what scope is this variable in?" The tool answers based on the actual AST structure.

Type Information

What types are involved? If your code uses TypeScript, the agent can access the exact type signatures. What are the parameters? What does the function return? This enables type-aware code generation, not just syntactically plausible code.

Architectural Boundaries

If you've defined module boundaries, what's exported and what's internal? The context tool knows which symbols are public and which are private, which modules can import from which other modules.

On-Demand vs. Pre-Indexed: Why Timing Matters

Here's a critical decision: do you compute structural context up front and store it, or compute it on demand?

Pre-indexed approach: You parse the entire codebase at some point (usually after a commit), build the dependency graph, and store it. When the agent needs context, it's instant.

On-demand approach: When the agent asks for context, you parse the relevant files and compute the answer right then.

Pre-indexing seems faster, but it has a staleness problem. If the agent makes a change and then needs updated context, the pre-computed graph is outdated. You either have to recompute everything (expensive) or serve stale information (wrong).

On-demand computation is slower per query but always correct. The agent makes a change, asks for context about the new state, and gets accurate answers. For reliable code generation, correctness beats speed.

Most production systems use a hybrid: pre-compute the base graph from the last commit, then recompute locally for any in-flight changes. This gives you speed and correctness together.

Embedding-based code search is a popular alternative that's part of semantic context approaches. Index the codebase, embed snippets, then when you need context, search for semantically similar code.

This works for "find me related functions" or "show me similar patterns". It's great for exploration and learning. But it's terrible for precise structural questions.

You ask embedding search: "what calls this function?" It returns code that's semantically related to the function. Maybe that code actually calls it, maybe it just does something similar. You get false positives. You also get false negatives — code that calls it but uses a different pattern might not show up.

An AST-based call graph gives you exactness. Here's the function. Here are all the places it's called. Period.

The difference shows up when the stakes are high. If you're refactoring and need to know every place a symbol is used, embeddings aren't reliable. You need AST precision.

The Practical Output: What the Agent Actually Receives

When an agent queries a structural context tool, here's what it gets back:

Query: "All call sites for UserService.create()"

Response:
- File: src/api/user-controller.ts, Line: 42
  Context: const user = await userService.create(userData);
  Type: Direct call

- File: src/jobs/user-sync.ts, Line: 156
  Context: const created = await UserService.create(importedUser);
  Type: Direct call

- File: tests/user-service.test.ts, Line: 89
  Context: const result = service.create({...testData});
  Type: Test call

- File: src/scripts/migrate-users.js, Line: 34
  Context: UserService.create(legacyUser).then(...)
  Type: Script call
javascript

The agent gets exact locations, exact context, exact type of call. It doesn't have to search, infer, or guess. It can now update all four call sites with confidence.

Compare that to what grep returns:

user-controller.ts:42:  const user = await userService.create(userData);
user-sync.ts:156: const created = await UserService.create(importedUser);
user-service.test.ts:89: const result = service.create({...testData});
migrate-users.js:34: UserService.create(legacyUser).then(...)
user-service.ts:5: export class UserService {
user-service.ts:12:   create(user) {
javascript

You get the same files, but you also get false positives (the class declaration, the method definition). You have to filter. You have to understand the context. The agent has to do human-level reasoning to separate signal from noise.

Structural context tools do the reasoning once and serve precise answers.

Language Support and Tradeoffs

AST parsers are language-specific. A JavaScript parser won't work on Python code. If your codebase is polyglot (JavaScript and Python and Go all mixed together), you need parsers for each language.

This is a real constraint. Maintaining and integrating multiple language parsers is work. But it's one-time work, and the accuracy payoff is worth it. Some projects use unified AST formats (like Tree-sitter, which supports 80+ languages with a common interface). Others pick the languages that matter most for their codebase.

The alternative is to use text-based approaches for everything, which means accepting the blindness. That's usually not acceptable in production systems.

Performance Characteristics

Parsing code takes time. A full codebase parse can take seconds or minutes depending on size. A single-file parse takes milliseconds. Query time (looking up dependencies, finding call sites) is fast once the AST is built.

The strategy is usually:

  1. Pre-compute on commit: After every commit, build the AST and dependency graph for the new state. Store it. This takes 5-30 seconds depending on codebase size.
  2. Serve queries from the graph: When the agent needs context, query the pre-computed graph. Milliseconds.
  3. Recompute dirty areas: When the agent makes a change, recompute only the affected files and their dependencies. Usually much faster than full recompute.

This gives you both speed and correctness.

AI-Native Infrastructure: Structural Tools at Generation Speed

Here's where Bitloops approaches this: structural context isn't an afterthought or a post-hoc analysis tool. It's built into the agent's workflow as a first-class service.

When an agent plans a change, it queries structural context to understand what will break. When it generates code, it references exact symbol signatures and dependencies. When it finishes, it checks its work against the structural model. Structural context isn't separate from generation — it's embedded in every step.

This is how you get agents that don't break code at scale.

Practical Implementation Patterns

If you're building this yourself, here are the core patterns:

Pattern 1: Query-Time Parsing

When the agent asks a question, parse the minimum necessary to answer it. Don't parse everything up front. For a dependency query, parse just the import statements. For a call graph query, parse just the function definitions and calls.

Pattern 2: Incremental Updates

When the agent makes a change, don't recompute the world. Recompute only what changed and what depends on it. This stays fast as the codebase grows.

Pattern 3: Cached Results

Cache the parse trees and dependency graphs. Parsing is the expensive part. Once you have the AST, querying it is cheap. Store the cache in memory during a session and persist it to disk for the next session.

Pattern 4: Language Abstraction

Hide language-specific details behind a common interface. Whether it's a JavaScript arrow function or a Python lambda, the query interface should be the same: "give me the parameters and return type".

Tradeoffs and When to Choose Alternatives

You don't always need full AST-based structural context. Trade-offs to consider:

When structural context is mandatory:

  • Multi-file refactoring
  • Architectural changes
  • High-stakes API modifications
  • Large codebases (1000+ files)
  • Enforcing architectural boundaries

When simpler approaches might work:

  • Single-file changes within a function
  • Adding a new feature in an isolated module
  • Bug fixes with local scope
  • Small codebases (< 100 files)

When you need to combine approaches:

  • Use structural context for understanding dependencies
  • Use semantic context for understanding purpose and domain knowledge
  • Use memory systems for historical context and learning over time
  • Use embeddings for finding related concepts

The best systems use all of these. Structural context for precision, semantic context for meaning, memory for learning, embeddings for exploration.

FAQ

Doesn't this require storing parse trees for the whole codebase?

Not necessarily. You can parse files on demand and cache only the query results. Or you cache parse trees for popular files. The tradeoff is latency vs. memory. Most systems do some pre-computation (parse on commit, store the graph) and on-demand computation (parse new changes as needed).

What if the code is dynamically generated or loaded?

AST parsing sees what's in the files. If code is generated at runtime or loaded dynamically, static analysis can't see it. This is where semantic context and memory come in — you track what was actually generated and what patterns led to it. Structural context handles the static case (90% of code), and you use other tools for the dynamic parts.

Doesn't this require the agent to understand the parse tree?

No. The agent doesn't see the AST directly. A structural context tool parses the AST and translates it into simple answers: "these are the call sites", "this is the return type", "these are the dependencies". The agent sees natural language or structured data, not abstract syntax trees.

How often does the structural context become stale?

It's stale as soon as the agent makes a change. So you either recompute immediately (expensive) or serve knowledge about the current change separately (hybrid). Most systems keep a "before" state (pre-computed) and an "in-flight" state (just this change) and merge the views.

Can you use type information instead of ASTs?

Type information is part of what you extract from ASTs. In TypeScript, you can use the TypeScript compiler API to get types. In Python, you can use type hints. But you still need to parse the code to get the types. So you're still building something similar to an AST under the hood.

What if the codebase has no tests or documentation?

Structural context doesn't depend on tests or docs. It's just the code structure. No tests needed. If anything, structural context makes it easier for agents to work with under-documented code because the agent can see the actual structure, not guess from comments.

Doesn't this break with obfuscated or minified code?

Yes. If your source is already minified or obfuscated, AST parsing is harder and less useful. But if you're running agents on source code (which you should be), it's not a problem. Work with source, not compiled artifacts.

Primary Sources

  • Incremental parser generating syntax trees for multiple programming languages efficiently. Tree-sitter
  • Foundational transformer architecture underlying modern language model capabilities. Attention Is All You Need
  • Combines retrieval mechanisms with generation for knowledge-intensive language tasks. RAG Paper
  • Framework for interleaving reasoning traces with external tool invocations in agents. ReAct
  • Tree-based prompting approach for exploring multiple reasoning paths in language models. Tree of Thoughts
  • Empirical study of attention degradation on information in middle of long contexts. Lost in the Middle

Get Started with Bitloops.

Apply what you learn in these hubs to real AI-assisted delivery workflows with shared context, traceable reasoning, and architecture-aware engineering practices.

curl -sSL https://bitloops.com/install.sh | bash