Concept

Context Window

Definition

The maximum number of tokens (input + output) that a language model can process in a single interaction. The context window determines how much information an agent can 'see' at once — including the system prompt, conversation history, retrieved documents, tool call results, and the response being generated. Context windows range from 4K tokens (early GPT-3.5) to 1M+ tokens (Gemini 1.5 Pro, Claude). Longer context windows enable agents to handle more complex tasks without information loss, but attention quality can degrade with very long contexts (the 'lost in the middle' problem).

Builder Context

Context window management is a core agent engineering skill. Monitor token usage per turn and implement a truncation strategy before you hit limits. Priority order for what to keep: system prompt (always), recent conversation turns (high), tool results from the current task (high), older conversation history (medium, summarize), retrieved documents (low, re-retrieve as needed). For agents that run many tool calls, the context fills fast — 200 tool calls at 500 tokens each is 100K tokens of history. Implement a sliding window or summarization strategy.