Chain of Thought (CoT)
Definition
A prompting technique where the model is encouraged to break down complex reasoning into intermediate steps before producing a final answer. In the agent context, CoT is the 'reason' phase of the ReAct loop — the model explicitly articulates its plan before taking action. Extended thinking or 'thinking tokens' are the model-native implementation of CoT, where the model uses dedicated tokens to reason internally before generating visible output.
Builder Context
CoT is free performance. For any agent decision that involves choosing between tools, evaluating conditions, or planning multi-step operations, prompt the model to think through its approach before acting. In production, you can hide CoT tokens from the user while still benefiting from them. The main trade-off is latency and token cost — CoT increases both. For latency-sensitive applications, consider pre-computing common reasoning chains or using structured decision trees instead.