LLM Observability
Definition
The broader practice of monitoring, measuring, and understanding the behavior of systems that use language models — encompassing agent tracing, performance monitoring (latency, throughput, error rates), cost tracking, quality measurement (accuracy, relevance, safety), and drift detection (performance degradation over time). LLM observability platforms provide dashboards, alerting, and analytics specifically designed for AI systems, complementing traditional APM tools.
Builder Context
Set up three layers of observability: (1) operational — latency, error rate, cost per request (standard APM); (2) quality — accuracy, relevance, user satisfaction (requires evaluation pipelines); (3) safety — hallucination rate, policy violations, adversarial probe responses (requires safety test suites). Alert on: cost spikes (runaway agent loops), latency increases (model provider issues), and quality drops (model updates, data drift). The cheapest investment with highest return: log every model call with enough metadata to reproduce it.