Retrieval-Augmented Generation (RAG)
Definition
An architecture that combines information retrieval with language model generation. Before generating a response, the system retrieves relevant documents from a knowledge base (using semantic search, keyword search, or hybrid approaches) and includes them in the model's context. RAG addresses the fundamental limitation of parametric knowledge — models can only know what was in their training data — by providing real-time access to current, domain-specific, or private information. RAG pipelines typically involve: query formulation, document retrieval, re-ranking, context assembly, and grounded generation.
Builder Context
RAG is the most reliable way to give your agent access to private or current data. The quality hierarchy: (1) chunking strategy matters more than embedding model; (2) retrieval recall matters more than precision (re-ranking handles false positives); (3) query formulation matters more than retrieval algorithm. Common failures: chunks too large (diluted relevance), chunks too small (lost context), naive similarity search (misses conceptual matches). For production agents, implement RAG as an MCP tool that the agent calls when it needs information, not as a fixed pre-processing step.