Tool-Calling Loops: The Core Pattern Behind Every Capable AI Agent
Reason, act, observe, repeat. This loop is the engine of every production agent — and getting the details right is what separates demos from products.
The Pattern That Runs the World
Every capable AI agent runs the same core loop. Claude Code runs it. Codex runs it. Devin runs it. Cline runs it. The specifics differ — Codex calls theirs a "turn lifecycle" with 10 steps, Cline calls it "Plan/Act modes" — but the underlying pattern is identical: the LLM reasons about what to do, calls a tool, observes the result, and repeats until the task is done.
This is the ReAct pattern (Reason + Act), introduced by Yao et al. in 2022. It sounds simple. The production details are not.
The Loop
┌─────────────────────────────────┐
│ User provides goal │
└───────────────┬─────────────────┘
▼
┌─────────────────────────────────┐
│ LLM reasons about next step │◄──────┐
└───────────────┬─────────────────┘ │
▼ │
┌───────────────┐ │
│ Tool call? │──── No ──► DONE │
└───────┬───────┘ │
│ Yes │
▼ │
┌─────────────────────────────────┐ │
│ Execute tool, get result │ │
└───────────────┬─────────────────┘ │
▼ │
┌─────────────────────────────────┐ │
│ Append result to context │───────┘
└─────────────────────────────────┘Each iteration is a "turn." The LLM sees the full history of previous reasoning, tool calls, and results. It decides whether to call another tool or produce a final answer. The loop continues until the LLM signals completion or a termination condition fires.
Tool Contract Design
A tool is not just a function. It's a contract between the agent and an external system. Broken contracts are the #1 source of agent failures in production.
Schema
Every tool needs a strict schema. The LLM uses it to decide which tool to call and how to format arguments. Vague schemas produce vague tool calls.
const searchTool = {
name: 'web_search',
description: 'Search the web. Returns top 5 results with title, URL, and snippet.',
parameters: z.object({
query: z.string().describe('Search query. Be specific.'),
maxResults: z.number().min(1).max(10).default(5),
}),
returns: z.object({
results: z.array(z.object({
title: z.string(),
url: z.string(),
snippet: z.string(),
})),
}),
};Key rules:
- Descriptions tell the LLM when to use the tool. Write them for the model, not for a human reading docs.
- Parameter descriptions constrain the input. "Search query" is useless. "Search query. Be specific. Include domain terms." is actionable.
- Return type schemas let the LLM predict what it will get back, which improves reasoning about subsequent steps.
Error Types
Tools fail. The agent needs to know how they fail so it can respond appropriately.
type ToolResult =
| { status: 'success'; data: unknown }
| { status: 'error'; errorType: 'transient' | 'permanent' | 'auth'; message: string }
| { status: 'timeout'; elapsedMs: number };Transient errors (rate limits, network blips): the agent should retry. Permanent errors (invalid input, resource not found): the agent should try a different approach. Auth errors: the agent should stop and escalate. If you return generic "error" for everything, the agent will retry permanent failures and waste tokens.
Idempotency
If a tool call might be retried (and it will be), the tool must be safe to call twice with the same arguments. A search is naturally idempotent. A "create record" endpoint is not — the agent might create two records. Design tools with idempotency keys or check-before-write patterns.
Timeouts
Every tool call needs a timeout. Codex's turn lifecycle uses a TurnContext that wraps every tool execution with a deadline. Their ToolRouter uses FuturesOrdered to process tool calls concurrently while respecting per-tool timeouts. Without timeouts, a hung API call blocks the entire agent loop.
Loop Termination
An unbounded loop is a runaway agent burning your API budget. You need three termination signals.
1. Max Iterations
Hard cap. If the agent hasn't completed after N turns, stop it.
const MAX_ITERATIONS = 25;
Set this based on your task complexity. Simple Q&A: 3-5. Research tasks: 10-15. Code generation with testing: 20-30. Start low and increase only when you have evidence that more iterations improve outcomes.
2. Confidence Threshold
Some agents can estimate their own confidence. Claude Code's review system uses a 0-100 confidence scoring system. If the agent's confidence in its answer exceeds a threshold (e.g., 80), it can terminate early without exhausting its iteration budget.
3. Explicit DONE Signal
The LLM itself decides the task is complete and produces a final answer instead of a tool call. This is the most common termination condition, but it's also the least reliable — the LLM might declare "done" prematurely, or it might never declare "done" and keep searching.
Production pattern: Combine all three. The loop ends when the LLM says DONE, OR confidence exceeds the threshold, OR max iterations are reached. Log which condition triggered so you can tune.
type TerminationReason = 'done_signal' | 'confidence_threshold' | 'max_iterations' | 'budget_exhausted';
function shouldTerminate(state: LoopState): TerminationReason | null {
if (state.llmSignaledDone) return 'done_signal';
if (state.confidence >= 80) return 'confidence_threshold';
if (state.iteration >= MAX_ITERATIONS) return 'max_iterations';
if (state.tokensUsed >= state.tokenBudget) return 'budget_exhausted';
return null;
}Context Window Management
This is where naive implementations die. Each turn adds the LLM's reasoning, the tool call, and the tool result to the context. After 10 turns with verbose tool results, you've consumed 50,000+ tokens of context — and the LLM starts losing track of earlier information.
Summarization
After every N turns (or when context exceeds a threshold), summarize the conversation so far. Replace the full history with a condensed summary plus the last 2-3 turns.
async function compressContext(turns: Turn[]): Promise<Turn[]> {
if (estimateTokens(turns) < CONTEXT_THRESHOLD) return turns;
const oldTurns = turns.slice(0, -3);
const recentTurns = turns.slice(-3);
const summary = await llm.summarize(oldTurns);
return [
{ role: 'system', content: `Previous work summary: ${summary}` },
...recentTurns,
];
}Selective Retention
Not all tool results are equally important. A search that returned useful data should stay in context. A search that returned nothing can be summarized as "searched for X, no results." Implement a relevance scorer that decides what to keep verbatim and what to compress.
Tool Result Truncation
Cap tool result size before injecting into context. A 10,000-token API response should be truncated or summarized before the LLM sees it. The LLM doesn't need the full payload — it needs the information relevant to its current goal.
Aider's Architect/Editor split demonstrates how context management affects performance at the system level. By separating the high-level reasoning (Architect) from the code editing (Editor), each agent operates with a focused context window. The result: 83% state-of-the-art on polyglot benchmarks. The Architect sees the codebase structure; the Editor sees only the files being modified. Neither wastes context on the other's concerns.
Production Hardening
Retry Policies
Not all failures are equal. Build a retry policy that distinguishes between error types.
const retryPolicy: Record<string, RetryConfig> = {
transient: { maxRetries: 3, backoffMs: [1000, 2000, 4000] },
permanent: { maxRetries: 0 },
auth: { maxRetries: 0, escalate: true },
timeout: { maxRetries: 2, backoffMs: [2000, 5000] },
};Tool Call Budgets
Set a per-tool budget in addition to the global iteration limit. If the agent has called `web_search` 10 times in one run, something is wrong — it's probably stuck in a search loop. Cap individual tool usage.
Observability
Log every turn with: iteration number, tool called, arguments, result status, tokens consumed, latency. This trace is your debugging lifeline when the agent produces wrong output. Langfuse and Arize Phoenix both support turn-level agent tracing.
What to alert on:
- A run hitting max iterations (the agent couldn't solve the task)
- A single tool being called more than 5 times in one run (stuck in a loop)
- Token budget exhaustion before the agent signals completion
- Tool error rates exceeding 20% in a rolling window
Without these alerts, your first signal that the loop is misbehaving is a user complaint or an unexplained API bill.
Full Example: Research Agent
Putting it all together:
async function researchLoop(goal: string): Promise<ResearchResult> {
const state: LoopState = {
turns: [{ role: 'user', content: goal }],
iteration: 0,
tokensUsed: 0,
tokenBudget: 100_000,
confidence: 0,
toolCounts: {},
llmSignaledDone: false,
};
while (true) {
const reason = shouldTerminate(state);
if (reason) {
logger.info(`Loop terminated: ${reason} after ${state.iteration} turns`);
return extractResult(state, reason);
}
// Compress context if needed
state.turns = await compressContext(state.turns);
// Get LLM decision
const response = await llm.chat({
messages: state.turns,
tools: [searchTool, readPageTool, analyzeTool],
});
state.tokensUsed += response.usage.totalTokens;
state.iteration++;
if (response.finishReason === 'stop') {
state.llmSignaledDone = true;
state.turns.push({ role: 'assistant', content: response.text });
continue;
}
// Execute tool calls
for (const call of response.toolCalls) {
state.toolCounts[call.name] = (state.toolCounts[call.name] ?? 0) + 1;
if (state.toolCounts[call.name] > 10) {
state.turns.push({
role: 'tool',
content: `Tool ${call.name} budget exhausted. Use a different approach.`,
});
continue;
}
const result = await executeWithTimeout(call, 30_000);
state.turns.push({ role: 'tool', content: JSON.stringify(result) });
}
}
}This loop handles termination (four conditions), context compression, tool budgets, and timeouts. It's roughly 40 lines. Production agents add error handling, observability, and retry logic — but this is the skeleton that every agent shares.
The ReAct loop is the foundation. [Sequential chaining](/guides/sequential-chaining-ai-agents) composes multiple loops into pipelines. [Reflection and critique](/guides/reflection-critique-loops-ai-agents) adds a second loop that reviews the first loop's output. But it all starts here — reason, act, observe, repeat.