guide

Tool-Calling Loops: The Core Pattern Behind Every Capable AI Agent

Reason, act, observe, repeat. This loop is the engine of every production agent — and getting the details right is what separates demos from products.

The Pattern That Runs the World

Every capable AI agent runs the same core loop. Claude Code runs it. Codex runs it. Devin runs it. Cline runs it. The specifics differ — Codex calls theirs a "turn lifecycle" with 10 steps, Cline calls it "Plan/Act modes" — but the underlying pattern is identical: the LLM reasons about what to do, calls a tool, observes the result, and repeats until the task is done.

This is the ReAct pattern (Reason + Act), introduced by Yao et al. in 2022. It sounds simple. The production details are not.

The Loop

┌─────────────────────────────────┐
│  User provides goal             │
└───────────────┬─────────────────┘
                ▼
┌─────────────────────────────────┐
│  LLM reasons about next step    │◄──────┐
└───────────────┬─────────────────┘       │
                ▼                          │
        ┌───────────────┐                  │
        │  Tool call?   │──── No ──► DONE  │
        └───────┬───────┘                  │
                │ Yes                      │
                ▼                          │
┌─────────────────────────────────┐       │
│  Execute tool, get result       │       │
└───────────────┬─────────────────┘       │
                ▼                          │
┌─────────────────────────────────┐       │
│  Append result to context       │───────┘
└─────────────────────────────────┘

Each iteration is a "turn." The LLM sees the full history of previous reasoning, tool calls, and results. It decides whether to call another tool or produce a final answer. The loop continues until the LLM signals completion or a termination condition fires.

Tool Contract Design

A tool is not just a function. It's a contract between the agent and an external system. Broken contracts are the #1 source of agent failures in production.

Schema

Every tool needs a strict schema. The LLM uses it to decide which tool to call and how to format arguments. Vague schemas produce vague tool calls.

const searchTool = {
  name: 'web_search',
  description: 'Search the web. Returns top 5 results with title, URL, and snippet.',
  parameters: z.object({
    query: z.string().describe('Search query. Be specific.'),
    maxResults: z.number().min(1).max(10).default(5),
  }),
  returns: z.object({
    results: z.array(z.object({
      title: z.string(),
      url: z.string(),
      snippet: z.string(),
    })),
  }),
};

Key rules:

Descriptions tell the LLM when to use the tool. Write them for the model, not for a human reading docs.
Parameter descriptions constrain the input. "Search query" is useless. "Search query. Be specific. Include domain terms." is actionable.
Return type schemas let the LLM predict what it will get back, which improves reasoning about subsequent steps.

Error Types

Tools fail. The agent needs to know how they fail so it can respond appropriately.

type ToolResult =
  | { status: 'success'; data: unknown }
  | { status: 'error'; errorType: 'transient' | 'permanent' | 'auth'; message: string }
  | { status: 'timeout'; elapsedMs: number };

Transient errors (rate limits, network blips): the agent should retry. Permanent errors (invalid input, resource not found): the agent should try a different approach. Auth errors: the agent should stop and escalate. If you return generic "error" for everything, the agent will retry permanent failures and waste tokens.

Idempotency

If a tool call might be retried (and it will be), the tool must be safe to call twice with the same arguments. A search is naturally idempotent. A "create record" endpoint is not — the agent might create two records. Design tools with idempotency keys or check-before-write patterns.

Timeouts

Every tool call needs a timeout. Codex's turn lifecycle uses a TurnContext that wraps every tool execution with a deadline. Their ToolRouter uses FuturesOrdered to process tool calls concurrently while respecting per-tool timeouts. Without timeouts, a hung API call blocks the entire agent loop.

Loop Termination

An unbounded loop is a runaway agent burning your API budget. You need three termination signals.

1. Max Iterations

Hard cap. If the agent hasn't completed after N turns, stop it.

const MAX_ITERATIONS = 25;

Set this based on your task complexity. Simple Q&A: 3-5. Research tasks: 10-15. Code generation with testing: 20-30. Start low and increase only when you have evidence that more iterations improve outcomes.

2. Confidence Threshold

Some agents can estimate their own confidence. Claude Code's review system uses a 0-100 confidence scoring system. If the agent's confidence in its answer exceeds a threshold (e.g., 80), it can terminate early without exhausting its iteration budget.

3. Explicit DONE Signal

The LLM itself decides the task is complete and produces a final answer instead of a tool call. This is the most common termination condition, but it's also the least reliable — the LLM might declare "done" prematurely, or it might never declare "done" and keep searching.

Production pattern: Combine all three. The loop ends when the LLM says DONE, OR confidence exceeds the threshold, OR max iterations are reached. Log which condition triggered so you can tune.

type TerminationReason = 'done_signal' | 'confidence_threshold' | 'max_iterations' | 'budget_exhausted';

function shouldTerminate(state: LoopState): TerminationReason | null {
  if (state.llmSignaledDone) return 'done_signal';
  if (state.confidence >= 80) return 'confidence_threshold';
  if (state.iteration >= MAX_ITERATIONS) return 'max_iterations';
  if (state.tokensUsed >= state.tokenBudget) return 'budget_exhausted';
  return null;
}

Context Window Management

This is where naive implementations die. Each turn adds the LLM's reasoning, the tool call, and the tool result to the context. After 10 turns with verbose tool results, you've consumed 50,000+ tokens of context — and the LLM starts losing track of earlier information.

Summarization

After every N turns (or when context exceeds a threshold), summarize the conversation so far. Replace the full history with a condensed summary plus the last 2-3 turns.

async function compressContext(turns: Turn[]): Promise<Turn[]> {
  if (estimateTokens(turns) < CONTEXT_THRESHOLD) return turns;

  const oldTurns = turns.slice(0, -3);
  const recentTurns = turns.slice(-3);
  const summary = await llm.summarize(oldTurns);

  return [
    { role: 'system', content: `Previous work summary: ${summary}` },
    ...recentTurns,
  ];
}

Selective Retention

Not all tool results are equally important. A search that returned useful data should stay in context. A search that returned nothing can be summarized as "searched for X, no results." Implement a relevance scorer that decides what to keep verbatim and what to compress.

Tool Result Truncation

Cap tool result size before injecting into context. A 10,000-token API response should be truncated or summarized before the LLM sees it. The LLM doesn't need the full payload — it needs the information relevant to its current goal.

Aider's Architect/Editor split demonstrates how context management affects performance at the system level. By separating the high-level reasoning (Architect) from the code editing (Editor), each agent operates with a focused context window. The result: 83% state-of-the-art on polyglot benchmarks. The Architect sees the codebase structure; the Editor sees only the files being modified. Neither wastes context on the other's concerns.

Production Hardening

Retry Policies

Not all failures are equal. Build a retry policy that distinguishes between error types.

const retryPolicy: Record<string, RetryConfig> = {
  transient: { maxRetries: 3, backoffMs: [1000, 2000, 4000] },
  permanent: { maxRetries: 0 },
  auth:      { maxRetries: 0, escalate: true },
  timeout:   { maxRetries: 2, backoffMs: [2000, 5000] },
};

Tool Call Budgets

Set a per-tool budget in addition to the global iteration limit. If the agent has called `web_search` 10 times in one run, something is wrong — it's probably stuck in a search loop. Cap individual tool usage.

Observability

Log every turn with: iteration number, tool called, arguments, result status, tokens consumed, latency. This trace is your debugging lifeline when the agent produces wrong output. Langfuse and Arize Phoenix both support turn-level agent tracing.

What to alert on:

A run hitting max iterations (the agent couldn't solve the task)
A single tool being called more than 5 times in one run (stuck in a loop)
Token budget exhaustion before the agent signals completion
Tool error rates exceeding 20% in a rolling window

Without these alerts, your first signal that the loop is misbehaving is a user complaint or an unexplained API bill.

Full Example: Research Agent

Putting it all together:

async function researchLoop(goal: string): Promise<ResearchResult> {
  const state: LoopState = {
    turns: [{ role: 'user', content: goal }],
    iteration: 0,
    tokensUsed: 0,
    tokenBudget: 100_000,
    confidence: 0,
    toolCounts: {},
    llmSignaledDone: false,
  };

  while (true) {
    const reason = shouldTerminate(state);
    if (reason) {
      logger.info(`Loop terminated: ${reason} after ${state.iteration} turns`);
      return extractResult(state, reason);
    }

    // Compress context if needed
    state.turns = await compressContext(state.turns);

    // Get LLM decision
    const response = await llm.chat({
      messages: state.turns,
      tools: [searchTool, readPageTool, analyzeTool],
    });

    state.tokensUsed += response.usage.totalTokens;
    state.iteration++;

    if (response.finishReason === 'stop') {
      state.llmSignaledDone = true;
      state.turns.push({ role: 'assistant', content: response.text });
      continue;
    }

    // Execute tool calls
    for (const call of response.toolCalls) {
      state.toolCounts[call.name] = (state.toolCounts[call.name] ?? 0) + 1;
      if (state.toolCounts[call.name] > 10) {
        state.turns.push({
          role: 'tool',
          content: `Tool ${call.name} budget exhausted. Use a different approach.`,
        });
        continue;
      }

      const result = await executeWithTimeout(call, 30_000);
      state.turns.push({ role: 'tool', content: JSON.stringify(result) });
    }
  }
}

This loop handles termination (four conditions), context compression, tool budgets, and timeouts. It's roughly 40 lines. Production agents add error handling, observability, and retry logic — but this is the skeleton that every agent shares.

The ReAct loop is the foundation. [Sequential chaining](/guides/sequential-chaining-ai-agents) composes multiple loops into pipelines. [Reflection and critique](/guides/reflection-critique-loops-ai-agents) adds a second loop that reviews the first loop's output. But it all starts here — reason, act, observe, repeat.

Sources