Monitoring

18 tools

Sort:

Arize Phoenix

80

Trust score

Arize Phoenix excels as an open-source LLM observability platform with strong docs and interop via OTEL/OpenAPI, backed by well-funded Arize AI, but lacks agent execution capabilities and load performance data.

Percy (BrowserStack)

NEEDS APPROVAL

78

Trust score

AI-powered visual regression testing platform. The Visual Review Agent (launched late 2025) reduces review time by 3x and automatically filters 40% of false positives by classifying diffs as 'likely false positive' or 'likely real change.' Tests full pages, user flows, and application states across Playwright, Cypress, Selenium, and Storybook. Single line of code for CI/CD integration. Free tier includes 5,000 screenshots/month. The production-grade option when Playwright's built-in VRT is outgrown.

Portkey

NEEDS APPROVAL

78

Trust score

Enterprise-grade AI gateway excels in interop and security with strong MCP support, backed by funding and compliance certs, suitable for production agent workflows.

bill shock without budget caps on platform-fee modelseparate billing complexity between platform and provider charges

Stale · Mar 2026MCPREST

W&B Weave

NEEDS APPROVAL

77

Trust score

Mature observability platform from well-funded W&B with strong agent tracing via SDK/Service API and MCP support, excellent for production LLM/agent monitoring but lacks deep execution capabilities.

Galileo AI

NEEDS APPROVAL

77

Trust score

Enterprise-grade AI evaluation platform with strong docs and integrations but limited public API details and no visible status page.

Stale · Mar 2026MCPREST

AgentOps

77

Trust score

AgentOps delivers strong agent observability with excellent framework integrations and docs, but lacks performance data and clear model training opt-out.

Applitools Eyes

NEEDS APPROVAL

74

Trust score

AI-powered semantic visual comparison — not just pixel-by-pixel, but understanding what UI elements mean. Recognizes dynamic content (ads, personalized dashboards, dates, transaction IDs) that would trigger false positives in pixel-diff tools. Named Strong Performer in Forrester Wave: Autonomous Testing Platforms, Q4 2025. Eyes 10.22 enables visual AI testing directly in Storybook and Figma. Multi-agent test lifecycle: one agent maps workflows, another generates Playwright/Appium code, a maintenance agent diagnoses failures. Self-healing selectors survive redesigns.

Opik (Comet)

FULL AUTO

74

Trust score

Mature open-source LLM observability platform with strong integrations and self-hosting, ideal for agent tracing but lacks public performance and reliability metrics.

SSE transport experimental and untested for production

Stale · Mar 2026REST

LangWatch

FULL AUTO

74

Trust score

Strong LLMOps observability platform with excellent docs and interop, enterprise compliance, tempered by absent load performance data.

Datadog LLM Observability

74

Trust score

Mature enterprise observability platform with strong tracing/integrations and company trust, but limited direct API evidence lowers agent readiness.

automatic premium activation without user confirmation

Stale · Mar 2026REST

MLflow

FULL AUTO

73

Trust score

Robust open-source ML tracking platform with excellent docs and interop, tempered by recent security incident and limited agent-specific readiness.

Verified May 2026REST

Laminar

FULL AUTO

72

Trust score

Solid open-source AI agent observability platform with strong docs and integrations but limited as a standalone agent executor due to focus on tracing rather than execution.

Maxim AI

66

Trust score

Enterprise-grade AI agent observability platform with strong compliance and integrations but limited public API depth and performance metrics.

Verified May 2026MCPREST

Chromatic

NEEDS APPROVAL

66

Trust score

Visual testing platform built by the Storybook team. The obvious choice if Storybook is your component catalog — captures snapshots of every story, diffs against baselines, and surfaces visual changes for review. Design token support for automatic styling consistency checks. Now works with Playwright for targeted page snapshots beyond components. Hosts Storybook MCP servers for team access. Catches UI bugs that unit tests miss by testing actual rendered output.

free-plan-pauses-on-snapshot-exhaustionmonthly-billing-cycle-reset-timing

Stale · Mar 2026

Traceloop OpenLLMetry

66

Trust score

Mature open-source OpenTelemetry extension for LLM observability with strong interop, active development under ServiceNow, excellent docs and community but lacks performance benchmarks.

Lunary

FULL AUTO

60

Trust score

Lunary offers solid LLM observability with strong integrations and security certifications but lacks performance data and has past security/stability concerns.

account limited after exceeding limit for 2 consecutive days

Stale · May 2026REST

OpenLIT

FULL AUTO

53

Trust score

OpenLIT excels as an open-source OpenTelemetry observability platform for LLM apps with strong integrations but lacks agent execution capabilities and dedicated API tooling.

Agenta

51

Trust score

Robust open-source LLMOps platform with strong docs and API but limited evidence on load performance and granular security controls.

Arize Phoenix

Percy (BrowserStack)

Portkey

W&amp;B Weave

Galileo AI

AgentOps

Applitools Eyes

Opik (Comet)

LangWatch

Datadog LLM Observability

MLflow

Laminar

Maxim AI

Chromatic

Traceloop OpenLLMetry

Lunary

OpenLIT

Agenta

Monitoring

Arize Phoenix

Percy (BrowserStack)

Portkey

W&amp;B Weave

Galileo AI

AgentOps

Applitools Eyes

Opik (Comet)

LangWatch

Datadog LLM Observability

MLflow

Laminar

Maxim AI

Chromatic

Traceloop OpenLLMetry

Lunary

OpenLIT

Agenta

W&B Weave

W&B Weave