Bug Fix Library

Known bugs in the agent ecosystem

Real bugs, real root causes, real fixes. Searchable index of production failures discovered across MCP servers, HITL providers, and agent frameworks.

200 bugs documented

high

Feed blocked with error message instructing to add resources (disk/memory). Operations rejected when resource limits reached.

Feed operations are blocked when content nodes approach resource safeguards (e.g., high disk/memory usage). Adding nodes requires time for data migration, during which limits are still hit, potentially causing inconsistent buckets across nodes.

vespa feed client, document API
View fix →
low

Unexpected results in vector range queries (e.g., missing/excess vectors at exact radius boundary).

No root cause identified; likely user confusion with boundary semantics (standard <= radius, no inclusive/exclusive like numeric ranges).

Redis Stack vector search (FT.SEARCH VECTOR_RANGE)
View fix →
medium

Highlight extraction returns empty _highlights: [] or incomplete snippets for long documents (&gt;30-50 pages or exceeding size limits). Indexing may fail with request size errors; search returns documents but no highlights despite matches.

Marqo has request size limits during indexing (configurable via MARQO_MAX_DOC_BYTES); large documents exceed this or cause embedding/highlight generation failures due to token limits in models. Highlights rely on precise matches in searchable fields, which degrade with oversized texts.

Marqo search highlights
View fix →
medium

Vector similarity search scores returned values slightly outside expected [0,1] range (e.g. >1 or <0), leading to unexpected ordering or invalid results in high-dimensional (e.g. 1024+) embeddings used in RAG/AI agents.[Neo4j 5 Changelog](https://github.com/neo4j/neo4j/wiki/Neo4j-5-changelog)

Aggregated floating point uncertainty in cosine similarity calculations for large dimensional vectors caused scores to slightly exceed documented [0,1] range in pre-5.20.0 versions.[Neo4j 5 Changelog](https://github.com/neo4j/neo4j/wiki/Neo4j-5-changelog)

Neo4j vector indexes, db.index.vector.queryNodes
View fix →
high

Console/logs spam "429 hit on route /gateway/bot" errors on startup; shards fail to connect; bot won't go online. May persist across restarts.

Bot makes excessive calls to GET /gateway/bot endpoint (to get WSS URL, shards, session_start_limit) during rapid restarts or improper sharding (spawning too many shards without delays), exceeding ~1000 session starts per day or concurrency limit (max_concurrency=1), triggering 429 rate limit. Free hosts exacerbate via shared IP Cloudflare bans. Separate from per-connection gateway event limit (120/60s).

Discord Gateway (/gateway/bot endpoint), sharding managers
View fix →
high

During long ops (>1hr, e.g. large uploads/syncs): token refresh fails mid-transfer, aborts with 'token expired and refresh token is not set' or reReadToken() loops/'newToken.Valid() false'. Logs show 'successful' refresh but op fails.

Box JWT auth (no refresh token; regenerate via assertion) tokens expire in 1hr. rclone refreshes at expiry, but oauth2 lib marks invalid 10s early (clock skew). Race: reReadToken() detects invalid first (fails), renewOnExpiry() too late. Box may expire slightly early.

rclone Box backend
View fix →
medium

Indexing fails with mapper_parsing_exception: "The [dense_vector] field [embedding] in doc [id] has more/fewer dimensions than defined in the mapping [expected]". Document rejected, bulk operations partially fail.[LangChain JS GitHub #6041](https://github.com/langchain-ai/langchainjs/issues/6041)

Elasticsearch strictly enforces that all vectors indexed into a dense_vector field have exactly the number of dimensions ('dims') defined in the index mapping (or inferred from first vector if optional 'dims' omitted). Mismatch between embedding model output and mapping causes parse failure during indexing.[Elasticsearch Docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html)

Elasticsearch indexing (dense_vector fields), LangChain ElasticsearchStore integrations
View fix →
high

HTTP 409 error with message: \"There is currently another in-progress request using this Idempotent Key (that probably means you submitted twice, and the other request is still going through): [key]. Please try again later.\" or code `idempotency_key_in_use`. Duplicate charges prevented, but requests fail unexpectedly in concurrent/multi-retry agent scenarios.

Multiple concurrent or parallel requests (common in AI agents with retry loops, multi-threading, or distributed execution) reuse the same idempotency key while a prior request with that key is still in-progress on Stripe's servers. Stripe detects this as a collision and returns `idempotency_key_in_use` to prevent processing overlaps. Root issue in agents: stateless tools regenerating identical keys for retries or parallel invocations without sufficient entropy/sessioning.

Stripe API integrations in AI agent tools (LangChain, CrewAI, custom agents)
View fix →
medium

Unexpected ranking order changes in hybrid search results using WeightedRanker despite balanced weights (e.g., 0.5,0.5). Items with high variance scores (e.g., 0.1 and 0.9) rank lower than low-variance items (0.39, 0.6) post-normalization, reversing intuitive weighted sum expectations. Raw scores look correct individually, but fused ranking mismatches manual calculations.

Milvus WeightedRanker/Reranker automatically normalizes raw similarity scores from different search paths (e.g., IP metric) to [0,1] using `0.5 + arctan(score)/π` before applying weights. This compresses score ranges nonlinearly, inverting relative rankings for certain distributions (e.g., low variance vs. high variance scores), as raw weighted sum would preserve original order but normalized does not. No disable option pre-2.6.x.

Milvus hybrid_search WeightedRanker
View fix →
low

Unexpected API errors (e.g., rate limit, timeout, context exceed) when using large batches, misinterpreted as undocumented batch size limit.

No bug exists; the claim of undocumented batch size limit is incorrect. Official docs explicitly state \"no batch size limit\" but limits are enforced by TPM and context length per model (e.g., v3: 131K tokens).

Jina Reranker API
View fix →
high

High RAM usage or OOM errors after creating payload indexes on large collections with sizable payloads/high-cardinality fields; memory spikes during index build.[Qdrant Indexing Docs](https://qdrant.tech/documentation/concepts/indexing/)

Payload indexes stored fully in RAM by default for fast filter evaluation during vector search, consuming memory proportional to field cardinality and payload size.[Qdrant Indexing Docs](https://qdrant.tech/documentation/concepts/indexing/)

Qdrant vector database
View fix →
low

Batch upsert request partially succeeds (some docs skipped due to conditions/duplicates) or entire request fails (e.g., HTTP 400 duplicate IDs, size limits), appearing as \"partial failure\".

No documented bug; writes batch into WAL entries (1/sec per namespace), ensuring durability on success. Partial skips occur only in conditional/filter ops (expected behavior), not failures. Misunderstanding of atomicity vs. conditional skipping.

Turbopuffer namespace.write upsert_rows upsert_columns
View fix →
low

Pods pending with "node(s) didn't match pod anti-affinity rules". Pinecone queries with high initial latency (300-400ms) labeled as cold.

No specific root cause found for combined issue. Kubernetes affinity conflicts occur when no nodes satisfy hard podAntiAffinity/nodeAffinity rules during scheduling. Pinecone serverless warm-up latency is due to cold namespace caching (~300ms first queries), unrelated to Kubernetes pods as it's managed service.

Kubernetes scheduler, Pinecone serverless
View fix →
low

Cluster fails to auto-scale (no scale-out during high load or scale-in during low), stuck in 'Modifying' status, write failures despite available reads, or support notifications for limit violations. No error codes specific to 'threshold configuration'.

No documented bug; potential issues from violating scaling prerequisites/limits (e.g., insufficient CUs, data >80% capacity, exceeding CU x replica max) causing resource check failures during auto-scale attempts. Thresholds are fixed/system-managed, not user-configurable, eliminating misconfiguration.

Zilliz Cloud (vector DB service)
View fix →
high

`docker compose up` with `deploy.resources.limits.memory: 10M` results in `docker stats` showing full host memory limit (e.g., 9MiB / 7.774GiB) instead of container limit (e.g., 9MiB / 10MiB). Warning may appear about ignored deploy key.

The new `docker compose` (v2 CLI, Docker 20.10+) did not implement translation of `deploy.resources.limits` (intended for Swarm) to underlying Docker run flags like `--memory` for standalone mode, unlike the legacy `docker-compose` (v1) Python tool. Fixed in docker/compose-cli PR [#1528](https://github.com/docker/compose-cli/pull/1528).

docker compose (v2 CLI plugin)
View fix →
medium

Full-text searches fail to match documents containing long words, URLs, base64 strings, or IDs longer than 40 characters, even when exact text matches exist. No results returned for queries with long tokens; indexed data appears missing those terms.

LanceDB FTS default tokenizer (&quot;simple&quot;) filters out and omits any tokens longer than max_token_length=40 characters during indexing. Long tokens like base64 strings, URLs, or technical IDs &gt;40 chars are dropped entirely, making them unsearchable.[LanceDB FTS Docs](https://docs.lancedb.com/indexing/fts-index)

LanceDB create_fts_index (Tantivy & native FTS)
View fix →
medium

Approximate nearest neighbor (ANN) queries using IVFFlat index return lower recall after data inserts/updates/deletes: true top-k neighbors missed, while exact search (high probes or sequential scan) finds them. Degradation worsens over time with more changes.

IVFFlat uses fixed centroids from k-means clustering at index build time. Inserts/updates add vectors to existing clusters but do not update centroids, causing cluster imbalance and recall degradation as data distribution shifts. [TigerData blog](https://www.tigerdata.com/blog/nearest-neighbor-indexes-what-are-ivfflat-indexes-in-pgvector-and-how-do-they-work) (2023).

pgvector (PostgreSQL extension)
View fix →
medium

After insert/update/delete, immediate $search queries on the collection miss/reflect old data (e.g., new doc not found, updated fields show prior values). Temporary fix: add 1s sleep. Alerts may fire for 'Index Replication Lag' or 'Mongot stopped replication'.

Atlas Search (`mongot` process) tails MongoDB oplog/change streams asynchronously for eventual consistency, introducing lag (typically seconds). Factors: replication across shards/replicas, disk/CPU contention (e.g., pauses at 90% disk use), index size/complexity, query load. Unlike transactional B-tree indexes, search updates don't block writes.

MongoDB Atlas Search
View fix →
medium

AI agent receives GitLab MR webhook with action='update' but empty 'changes': {} payload, breaking logic expecting changes details (e.g., label/assignee/reviewer/state diffs). Triggers unexpectedly without useful delta info, causing skipped automations or false processing.

GitLab intentionally triggers MR 'update' webhooks for certain activities (approvals, rule changes, thread resolutions, re-reviews) where no tracked attributes change, resulting in empty 'changes' object despite documentation warning receivers to always inspect it. Some changes (e.g., certain assignee/reviewer updates in group hooks) inconsistently populate 'changes' due to backend logic or version-specific behavior around Oct 2024 (GitLab ~17.5).

GitLab webhook integration/parsers
View fix →
high

Deploy shows as successful and healthy in Render dashboard (internal health checks pass), but public access to the service (onrender.com URL or custom domain) returns 502 Bad Gateway errors. Happens especially after initial failed deploys or around deployment times; users can shell in and confirm app runs, but external requests fail.

When the first build/deploy of a new service fails (even if subsequent deploys succeed), Render fails to create necessary networking components like the load balancer, leading to misconfigured routing even for healthy instances. Internal health checks pass, but public traffic gets 502 errors.

Render.com deployments
View fix →
low

Error 1102 "Worker exceeded resource limits" (exceededCpu), despite low aggregate dashboard CPU time (e.g., 56ms avg but errors at 3000ms limit) [Forum post](https://www.answeroverflow.com/m/1460135885432688641). Local tests pass, prod fails.

No documented accounting bug; errors from exceeding strict per-invocation CPU limits (10ms free/30s default paid), excluding I/O waits. Apparent discrepancies often aggregate metrics vs per-request; hardware/dev env diffs affect local testing [Miniflare #161](https://github.com/cloudflare/miniflare/issues/161).

Cloudflare Workers, Miniflare (dev tool)
View fix →
medium

Formula and rollup fields visible in Airtable UI and API test responses but missing from third-party tool (e.g., Adalo) schema/collections despite data present.

Airtable API omits empty fields from record responses; third-party tools derive schema from sample records, missing computed formula fields if no populated rows. Undocumented meta API required for full schema.

Airtable API integrations (Adalo, Budibase, etc.)
View fix →
high

collection.count() raises StopIteration or segment-related errors (e.g., missing segments, empty iterator from filter). Seen after adding data in Docker; count fails despite data present.

Missing or incorrect HNSW index segments due to incomplete metadata configuration (e.g., missing 'hnsw:search_ef'), causing count() to fail when iterating over empty/missing segments in Docker/persistent environments. OpenAI embedding or concurrency may exacerbate.

chromadb Collection.count()
View fix →
high

"remaining connection slots are reserved for non-replication superuser connections" error. pg_stat_activity shows accumulating 'idle' connections after Edge Function calls, quickly exhausting DB limits (e.g., 10 idle after few calls on 60-conn limit).

In serverless Edge Functions, each invocation is isolated; creating a new Pool per call keeps idle connections open for reuse within that pool, but since functions scale independently, connections accumulate across invocations until the DB connection limit (e.g., 60 on Micro) is exhausted. pool.release() recycles to the pool but does not close.

Supabase Edge Functions postgres Pool
View fix →
medium

Deployment fails with logs: "Path: /health", "Retry window: 5m0s", repeated "Attempt #X failed with service unavailable" or "status 404. Continuing to retry...", finally "1/1 replicas never became healthy! Healthcheck failed!"

Railway deployment healthcheck polls configured path (e.g. /health) expecting HTTP 200 within timeout (default 300s). Fails if endpoint missing (404), app not ready (service unavailable), wrong port binding, or non-200 response. Expected behavior, not bug—ensures zero-downtime deploys. [Railway Docs](https://docs.railway.com/deployments/healthchecks)

Railway deployment healthchecks
View fix →
high

504 Gateway Timeout error with code INTERNAL_EDGE_FUNCTION_INVOCATION_TIMEOUT or FUNCTION_INVOCATION_TIMEOUT in Vercel logs; function hangs or fails after ~25s without response; works locally but fails in production.

Vercel Edge Runtime enforces a 25-second limit to begin sending responses to enable low-latency edge computing and prevent resource abuse; total execution (streaming + waitUntil) capped at 300s since March 2025 rollout to ensure predictability (previously unbounded).

Vercel Edge Functions
View fix →
high

In paid WCS with multi-tenancy enabled: Object counts reflect only the last uploaded tenant across all tenants (e.g., 905 instead of per-tenant totals summing to 6402); queries return data only from last tenant; dimensions correct but counts dynamic per last upload. Works in sandbox.

Suspected mismatch between v3 client batch operations and paid WCS server handling of tenant contexts, causing shared state overwrite instead of per-tenant sharding. Sandbox server differed in behavior.

Weaviate (multi-tenancy in WCS paid clusters)
View fix →
medium

Excessive webhook retries shown as 'Pending' in PayPal Developer Dashboard; repeated deliveries of same event over hours/days; potential event loss after 3 days if unresolved.[PayPal Docs](https://developer.paypal.com/api/rest/webhooks/)

Webhook listener returns non-2xx HTTP status (4xx/5xx), times out, or is unreachable, triggering PayPal's retry mechanism (up to 25 attempts over 3 days with exponential backoff). Developers often verify payload via API call synchronously, delaying/blocking the 2xx response.[Stack Overflow](https://stackoverflow.com/questions/65974110/what-needs-to-happen-for-paypal-webhooks-status-to-change-from-pending-to-succ)

PayPal API webhook integration
View fix →
medium

Lambda invocations after idle periods (>5-15 min) or during scale-up take significantly longer (originally 10-20s, now 1-5s+) than warm invocations (&lt;500ms). CloudWatch Logs show high Init Duration in REPORT lines, e.g., \"Init Duration: 10000.00 ms\". First invocation post-deployment or config change especially slow.

When a Lambda function is configured to access VPC resources, AWS must provision and attach an Elastic Network Interface (ENI) to the execution environment during cold starts. Prior to 2019 Hyperplane improvements, this added 10+ seconds; post-improvement, ENIs are pre-provisioned per subnet/security group combo and tunneled, but new concurrent executions beyond warmed capacity still incur ENI attachment/setup overhead (now ~1s vs 10s+), plus standard init (code download/runtime).

AWS Lambda
View fix →
high

API error on POST /webhooks: {"errors":[{"message":"ETIMEOUT: Asana was unable to connect to your webhook within the timeout of 10000 ms.","help":"..."}]}. No handshake request received on server logs, despite local ngrok working.

Asana's webhook creation triggers a synchronous "handshake" POST to the target URL (with X-Hook-Secret header, empty body) that must be acknowledged immediately (&lt;10s timeout) with 200/204 + echoed X-Hook-Secret header while the create API call waits. Server connectivity issues (firewall, TLS), slow responses, or incorrect handshake handling cause connection timeout (ETIMEOUT).

Asana webhook creation API
View fix →
medium

OutOfMemoryException during entity operations; function timeouts; slow performance from large serialization; ReadEntityStateAsync returns default(T) if docs followed literally (outdated). Seen in Consumption plan with states &gt;50-100MB.

Azure Durable Functions entities store state in Azure Table Storage (max 1MB/entity, 64KB/property). Larger states auto-move to blobs. However, every operation requires full state deserialization/serialization in function memory, causing OOM on Consumption plan (1.5GB limit) for large states (~100MB+). Outdated docs mention 16KB ReadEntityStateAsync limit (no longer enforced, per GitHub).

Azure Functions DurableTask extension (entities)
View fix →
high

Logs show repeated reconnection INFO: \"[INFO] socket-mode:SocketModeClient:0 Reconnecting to Slack ...\" followed by crash/error: \"Unhandled event 'server explicit disconnect' in state 'connecting'.\" Missed events during disconnects; app unresponsive until restart. Seen in Docker/K8s.

@slack/socket-mode 1.x state machine design flaw causes unhandled 'server explicit disconnect' events during race conditions in reconnects after expected Slack disconnects (refresh every few hours, reasons like 'refresh_requested', 'too_many_websockets'). Hanging connections from unclean shutdowns exceed 10-connection limit, triggering loops.

Slack Bolt JS Socket Mode (@slack/bolt, @slack/socket-mode)
View fix →
medium

Deployment fails with: `ERROR: ... Could not create Cloud Run service. spec.template.spec.containers.resources.limits.memory: Invalid value specified for memory. For the specified value, maxScale may not exceed 83.`

In GCP Cloud Functions Gen2 (built on Cloud Run), instance scaling (maxScale/max-instances) is constrained by allocated memory/CPU per instance. Low/default memory (~256MiB) limits max-instances to ~83; higher max-instances from prior Gen1 config or defaults violates this during deployment/migration, causing validation failure before runtime scaling issues occur.

gcloud functions deploy, Cloud Functions Gen2
View fix →
medium

HTTP 429 "Too Many Requests" or "Rate limit for this resource has been exceeded" when calling Bitbucket API to trigger pipelines (POST /2.0/repositories/{workspace}/{repo_slug}/pipelines/). Requests rejected after ~1000/hour per token.

Bitbucket Cloud enforces API rate limits on a rolling 1-hour window to ensure stability: base 1000 requests/hour per token/user for repository endpoints (including pipelines trigger POST), scaled to 10k max based on workspace size. Exceeding triggers 429 "Too Many Requests" or "Rate limit exceeded". Limits apply per token for access tokens, per user for app passwords/OAuth.

Bitbucket API (pipelines trigger endpoint)
View fix →
low

API call to /institutions/get_by_id with include_status=true returns null status or no status data in Sandbox/low-traffic institutions; unexpected absence of connectivity health info leads to failed assumptions in agent logic.

Institution status is intentionally unavailable in Sandbox environment and returns null for low-traffic institutions to avoid inaccurate data; developers may mistake this expected behavior for a bug when expecting status in non-Production or low-volume scenarios.

Plaid API institutions tool
View fix →
medium

API returns 400 Bad Request with \"validation_error\": \"body failed validation. Fix one: body.filter.and[X].or[Y].[property_type] should be defined, instead was `undefined`.\" (repeating for multiple types). Happens with complex/nested filters; simple filters work.

Notion API enforces undocumented size/complexity limits on the database query request body, particularly the 'filter' object. Exceeding these (e.g., too many nested 'and'/'or' conditions, large compound filters) triggers JSON schema validation failure. Official docs specify 2-level nesting max for compounds but no explicit array size; general request limits (500KB payload) may apply. Deep nesting &gt;2 levels fails with validation errors as seen in community reports.

Notion API database query endpoint (/v1/databases/{id}/query)
View fix →
medium

Type errors or runtime failures when deserializing webhook payloads using Linear TypeScript SDK or GraphQL schema types, e.g., expecting Team object but receiving teamId string in IssueLabel updates.

Linear webhook payloads use ID strings (teamId) instead of full nested GraphQL objects (team), contradicting documentation that states payloads reflect GraphQL schema exactly.

Linear webhook integration
View fix →
medium

Task fails to retry after many attempts with scheduling error like \"backoff overflow\", infinite delay, or NaN timeout in logs; scheduler crashes or skips retries.

Exponential backoff calculation `minTimeout * factor ^ (attempt-1)` can produce extremely large numbers (NaN/infinity) for high `maxAttempts` without proper capping in SDK math, causing retry scheduling failure when delay exceeds JavaScript Number.MAX_SAFE_INTEGER (~9e15 ms).

Trigger.dev SDK task retry
View fix →
high

API calls fail with HTTP 429 error: &quot;Resource has been exhausted (e.g. check quota)&quot; or &quot;Quota exceeded for quota metric 'Write requests' and limit 'Write requests per minute per user' of service 'sheets.googleapis.com'&quot;. Requests succeed sporadically but fail under load.

Exceeding per-minute rate quotas: e.g., 300 read requests/minute/project, 60 write requests/minute/user. Quotas refill every minute; concurrent or looped requests in AI agents/scripts often hit these without backoff, triggering 429 errors even if daily totals are low.[Google Sheets API Limits](https://developers.google.com/sheets/api/limits)

Google Sheets API
View fix →
high

Emails are automatically dropped with 'Invalid Template' reason in activity feed, despite API returning 202 success and recipients not suppressed. No email delivery occurs, leading to silent failures in production.

SendGrid validation rejects requests using non-existent or inactive template versions, marking emails as 'Invalid Template' and dropping them silently (API returns 202 Accepted, but no delivery occurs).

SendGrid Email API (Dynamic Templates)
View fix →
low

API request fails with Twilio error lacking documentation page or 'more_info' link. Builder sees generic HTTP 400/5xx or code without explanation (e.g., \"Message body is required\" not found in docs at time). SDK throws RestException with message/code but no resolution path.

Twilio REST API returns structured errors with HTTP status, message, optional code (e.g., 21602), and more_info URL. While most (~500+) codes have dedicated docs, new features, edge cases, deprecations, or rare carrier errors may lack pages or have incomplete info. Community fills gaps via SO/GitHub, but SDKs/responses sometimes surface generic HTTP errors without codes. Docs claim 'full list' but real-world shows occasional gaps.

Twilio Messaging,Voice APIs (any REST endpoint)
View fix →
medium

HTTP 400 Bad Request from GET /rest/api/3/search/jql?nextPageToken=null with error: {\"errorMessages\": [\"The provided next page token is invalid or expired.\"], \"errors\": {}}. Builder expects first page of search results.[JRACLOUD-94632](https://jira.atlassian.com/browse/JRACLOUD-94632)

Bug in the GET /rest/api/3/search/jql endpoint where passing nextPageToken=null as query parameter is not handled correctly, triggering invalid/expired token validation erroneously for initial requests.[JRACLOUD-94632](https://jira.atlassian.com/browse/JRACLOUD-94632)

Jira Cloud REST API /rest/api/3/search/jql (GET)
View fix →
high

Workflow replay fails with `java.lang.IllegalStateException: Version is already set to 1. The most probable cause is retroactive addition of a getVersion call with an existing 'changeId'`. High volume of InternalWorkflowTaskException during replay of EVENT_TYPE_MARKER_RECORDED events. Workflows may self-recover but generate errors.

Retroactively adding `Workflow.getVersion(changeId, ...)` to a code path after existing workflow histories have implicitly treated that changeId as version 1 (default). During replay, the new code conflicts with the history's implicit marker in the VersionStateMachine, causing IllegalStateException.

Temporal Workflow SDK (e.g., Java)
View fix →
medium

Query rejected with schema validation error (missing first/last or &gt;500k nodes), HTTP 200 + GraphQL errors for rate limits/points exceeded (e.g. remaining:0), partial results + resource limit exceeded error for heavy compute, or timeout &gt;10s.[GitHub Docs](https://docs.github.com/en/graphql/overview/rate-limits-and-query-limits-for-the-graphql-api)

GitHub GraphQL API validates queries against node limit (500k total projected nodes via connection pagination multipliers) and point-based rate limits (calculated from estimated backend requests: nested connections sum divided by 100, min 1 pt) to prevent DoS/abuse; also terminates excessive compute queries.[GitHub Docs](https://docs.github.com/en/graphql/overview/rate-limits-and-query-limits-for-the-graphql-api)

GitHub GraphQL API
View fix →
high

Throttled functions (concurrency=1 per tenant/environment) execute later events (e.g., booking) before earlier ones (e.g., customer upsert) complete, even while the earlier is retrying on failure, leading to out-of-order processing.

Independent functions triggered by separate events process concurrently despite per-function concurrency limits; retries do not block subsequent functions from dequeuing and executing out of order.

Inngest function execution/replay
View fix →
high

Large file upload via upload session fails mid-process with errors like: POST https://content.dropboxapi.com/2/files/upload_session/append_v2 failed with 502 {\"error\": \"Internal Server Error\"} or 503 \"upstream connect error or disconnect/reset before headers\" on specific chunks, potentially aborting the entire upload if not handled.

Transient server-side failures (internal errors, upstream connectivity issues) during individual chunk append operations to upload sessions, returning HTTP 502/503 on /2/files/upload_session/append_v2.

Dropbox API (files/upload_session/append_v2 endpoint)
View fix →
medium

ERROR asyncio:base_events.py:1879 Unclosed client session and Unclosed connector with ResponseHandler objects lingering in connections deque during ADK streaming (/run_sse) or eval runs, breaking evaluations.

Dependency version mismatch after bumping google-genai to 1.37.0+ caused aiohttp ClientSession and TCPConnector (with ResponseHandler) not to close properly in concurrent eval/streaming scenarios, due to incomplete resource cleanup in google-genai before fix in ADK.

Google ADK (adk-python), aiohttp via google-genai
View fix →
low

Digest notifications sent too early/late or not batching events within expected window; studio shows errors during digest testing; self-hosted docker fails to merge events timely.

Digest windows use discrete time units (sec/min/hr/day) without sub-second precision; potential job scheduling drift in self-hosted or studio environments, as seen in studio errors and docker digest issues.

Novu digest step
View fix →
low

Dynamic variables containing HTML tags (e.g., {{body}} where body = "Hi...&lt;br&gt;&lt;br&gt;Welcome") display raw tags like "&lt;br&gt;&lt;br&gt;" as plain text in the email instead of rendering line breaks or HTML formatting.

Postmark's Mustachio templating engine automatically HTML-escapes variables using {{variable}} syntax by default to prevent XSS attacks from untrusted input containing HTML/JS. This escapes &lt; to &amp;lt;, causing HTML tags in dynamic variables to display as literal text instead of rendering.

Postmark Templates API (sendEmailWithTemplate)
View fix →
medium

After updating app config via API, SDK still uses old values (e.g. safariWebId undefined); segments/targeting may not reflect changes immediately in dashboard or sends.

OneSignal app config fetched by SDK cached at CDN/server layer; API updates (Update App) do not immediately invalidate cache. Segments may have similar client-side or dashboard caching delays.

OneSignal Website SDK, Segments
View fix →
medium

Push notifications fail to deliver with error like "Notification payload is too large. Actual length XXXX and MAX ALLOWED 4096" in Courier logs or provider feedback. No notification received by device.

Push providers integrated with Courier (APNs, FCM) enforce strict payload limits (typically 4KB) to ensure efficient delivery. Courier passes through these limits; exceeding causes rejection by the provider.

Courier push notifications
View fix →
medium

AI Action blocks fail with HTTP 429 error and rate limit message. In apps: triggers error handler. In workflows (pre-fix): triggers success handler with populated `data.error`. Limits hit quickly (cumulative across Vectors, queries, etc.).

Retool imposes strict hourly token limits on organization-wide usage of Retool-managed AI keys (e.g., 250k tokens/hour for most AI actions) to prevent abuse and restrict to non-production. Additional bug (pre-3.183.0): workflows treated 429 as success (populating data.error but not triggering error handlers).

Retool AI Action blocks, Retool Agents, Vectors, LLM Chat (using Retool-managed keys)
View fix →
high

Bot sends Adaptive Card but renders blank, partial, shows "Error encountered while rendering this message", or actions fail in Teams clients (desktop/web/mobile).

Teams clients have incomplete support for newer Adaptive Cards schema versions (&gt;1.4 general, &gt;1.2 mobile), causing parsing/rendering failures. v1.5+ features are unsupported post-client updates ([Teams Q&A 2025](https://learn.microsoft.com/en-us/answers/questions/5570683)).

Microsoft Bot Framework SDKs (C#, JS, Python), Teams channel
View fix →
medium

Generic user message: "There is an error with the message you just sent, but feel free to ask me something else". API returns errors like 422 (invalid JSON/500 status) or timeout-related failures; conversation halts or skips webhook result.

The assistant enforces a strict 1-30 second configurable timeout on webhook responses. If the external service does not return a valid JSON response within this window, it times out, due to synchronous blocking nature of webhook calls in the conversation flow. Possible triggers: slow backend processing, network delays, invalid responses, or server errors (e.g., 500).

IBM watsonx.ai Assistant webhook (pre/post-message)
View fix →
low

Agents may ignore role instructions and follow malicious injected prompts in untrusted inputs during role-playing sessions, potentially leaking data or executing unintended actions.

No specific root cause identified. CAMEL-AI role-playing relies on LLM prompt engineering which is inherently susceptible to prompt injection like all LLMs, where untrusted inputs can override role instructions.

CAMEL-AI role-playing agents
View fix →
high

Bot status stuck at "NOT Ready - This bot has not completed the knowledge synchronization" or "Waiting Sync" / indexing hangs indefinitely after adding PDF/knowledge files.

EventBridge Pipe loads entire DynamoDB record (prompt + file metadata) into ECS task Container Overrides parameter, exceeding 8192 char limit when prompts &gt;8000 chars or many docs (50-300 files). Causes 400 error: "Container Overrides length must be at most 8192".

AWS Bedrock Agents (with Knowledge Base), aws-samples/bedrock-chat sample app
View fix →
medium

AI generates SQL/API queries that may not match database schema perfectly or require iteration for complex prompts, as noted in user feedback questioning schema awareness.

Inherent LLM limitations in zero-shot query generation without full schema awareness, leading to potential inaccuracies in complex or undocumented databases.

Superblocks AI (Clark)
View fix →
high

`InvalidUpdateError` with `INVALID_CONCURRENT_GRAPH_UPDATE` when parallel branches/nodes write to shared state key (e.g., `messages`) without reducer. Message history duplicates or corrupts; tool calls mismatch in parallel tool execution.

Parallel nodes or branches (including in subgraphs invoked concurrently) attempt concurrent updates to the same non-reducer-annotated state key (default 'last write wins'), triggering `InvalidUpdateError: INVALID_CONCURRENT_GRAPH_UPDATE`. Exacerbated by `operator.add` on messages (corrupts history) or nodes returning full state (duplicates).

LangGraph StateGraph
View fix →
medium

Reports contain fabricated web sources and content that appear real when no relevant context was actually found, misleading users (especially with custom retrievers returning empty results).

The write_report method passes empty context to the LLM when no relevant sources are retrieved (e.g., custom RAG returns nothing), causing the LLM to hallucinate realistic-looking sources and content instead of handling the no-data case explicitly.

GPT Researcher (write_report method)
View fix →
high

Agent executes malicious LLM-generated code locally (e.g., rm files, network calls, RCE) when `use_docker=False` or pre-0.2.8. Errors if Docker unavailable but falls back insecurely. Output shows local file changes/host access unintended by user.

LocalCommandLineCodeExecutor runs LLM-generated code via subprocess on host (no containerization). Relies on basic command sanitization (prevents some destructive cmds) but vulnerable to arbitrary host access/RCE via malicious code (e.g., os.system). Docker executor isolates via containers [AutoGen Blog](https://microsoft.github.io/autogen/0.2/blog/2024/01/23/Code-execution-in-docker/) [AG2 Docs](https://docs.ag2.ai/latest/docs/api-reference/autogen/coding/LocalCommandLineCodeExecutor/).

LocalCommandLineCodeExecutor
View fix →
low

Users potentially receive duplicate Knock notifications due to deduplication failure during delivery.

No specific root cause documented; extensive searches found no matching bug reports for this exact issue in Knock tools.

Knock notification system
View fix →
low

No documented symptom matching query.

No matching bug documented across GitHub, Stack Overflow, blogs, or docs for Airplane.dev or similar tools.

Unknown / Airplane task executor
View fix →
medium

letta-evals run command and other letta-evals commands fail with: cannot import name 'AgentState' from 'letta_client' (tried on Python 3.11, 3.12, 3.13 on MacOS)

Breaking change in letta-client SDK where AgentState class was renamed, moved, or removed between versions, causing import failure in letta-evals after pip install without matching versions.

letta-evals CLI, letta-client Python SDK
View fix →
high

Tool usage fails during hierarchical delegation: "Arguments validation failed: 2 validation errors for DelegateWorkToolSchema\ntask\n Input should be a valid string [type=string_type, input_value={'description': '...', 'type': 'str'}, input_type=dict]\ncontext\n Input should be a valid string [type=string_type, input_value={'description': '...', 'type': 'str'}, input_type=dict]"

CrewAI's DelegateWorkToolSchema expects string inputs for 'task' and 'context', but LLMs generate dicts like {'description': '...', 'type': 'str'} influenced by tool schema descriptions in prompts. Validation via Pydantic fails with TypeError.

DelegateWorkTool (crewAI)
View fix →
medium

Error toast when clicking Preview/Integration tabs: "There was an error while loading /gen-app-builder/locations/global/engines/[ID]/configurations?project=[ID]. Please try again. It may be a browser or network issue." Configurations fail to load/persist visibility in UI.

Likely backend API failure loading configurations (/gen-app-builder/.../configurations endpoint), possibly transient service issue, permissions, or regional rollout problem. No confirmed technical root cause documented.

Vertex AI Agent Builder (GCP Console UI)
View fix →
low

Intermittent failures or inconsistent plugin availability during concurrent application startup or dynamic plugin loading.

No documented root cause; potentially a custom implementation issue or unreported bug not present in public records.

Semantic Kernel plugin system
View fix →
low

Agents fail to schedule or run due to resource conflicts; Docker containers crash or loop (e.g., Redis warnings, GUI stuck initializing); high memory usage observed.

No specific root cause documented. Possible misconfiguration of Docker resources (memory/CPU limits) or high load from concurrent agents exhausting system resources, as noted in setup docs and general GitHub issues.

SuperAGI agent scheduling, Docker compose
View fix →
low

App fails to load or login redirects to localhost:3000 causing 404 errors (primarily self-hosted Docker deployments, not browser-specific).

No specific root cause identified; AgentGPT (Next.js + Tailwind) designed for modern browsers only. Potential deployment misconfigs (localhost redirects) mistaken for browser issues.

AgentGPT web UI
View fix →
medium

Flood of excessive, duplicate, or low-priority alerts from AI model monitoring system (e.g., Arthur Engine), overwhelming teams and causing alert fatigue where critical issues are missed.

Misconfigured alert thresholds/conditions leading to false positives, duplicates, and cascading alerts from dependent services (common in microservices/ML monitoring setups like Arthur Engine).

Arthur AI model monitoring (arthur-engine)
View fix →
high

Multiple /stream calls for same workflow runId create duplicate Inngest events, triggering multiple full workflow executions (e.g., parallel steps create 3x Zendesk tickets instead of 1). Duplicate side-effects despite same input/runId.

InngestRun._start() calls this.inngest.send() without event-level `id` (idempotency key), sending new Inngest events even for existing runId. Stream() guards fail across multiple /stream requests (new InngestRun instances). Idempotency option not forwarded to inngest.createFunction().

Mastra InngestWorkflow, InngestRun._start(), workflow triggers
View fix →
medium

The agent runs continuously without stopping, generating and reprioritizing tasks endlessly, consuming API tokens/costs rapidly, often repeating similar tasks or getting stuck in repetitive cycles (e.g., endless research/testing loops), with no natural termination even after achieving the objective.

BabyAGI's core design uses an infinite `while True` loop that pulls a task, executes it, creates new tasks via LLM, reprioritizes the list via LLM, and repeats without an exit condition other than an empty task list—which rarely occurs due to LLM's tendency to generate similar or overlapping new tasks indefinitely.

BabyAGI (original and LangChain implementations)
View fix →
medium

Rebuff fails to detect certain sophisticated prompt injection attacks, allowing malicious inputs to reach and manipulate the LLM, potentially leading to data exfiltration or unauthorized actions.

Rebuff is a prototype/alpha tool using heuristics, LLM detection, vector DB similarity, and canary tokens, which are probabilistic and can produce false negatives. No complete solution to prompt injection exists; skilled attackers can craft novel payloads to evade layers.

rebuff
View fix →
high

Sudden spike in LLM token consumption/costs during agent runs using Tavily search; context window overflows; agents fail with token limit errors or degrade in quality due to excessive tool output.

LangChain's TavilySearchResults tool defaults to search_depth='advanced' (deeper results, more content per result) and max_results=5, producing verbose JSON responses (~1000s tokens per call) that overwhelm LLM context windows in agent loops, especially with multiple tool calls.

TavilySearchResults (langchain_community.tools.tavily_search)
View fix →
medium

Rails.generate() or server returns "I'm sorry, an internal error has occurred." with logs showing embedding model download failures (MaxRetryError, SSLError, Permission denied: '.cache'), no LLM calls, and failure in generate_user_intent during dialog flow processing.

When using dialog rails with embeddings_only: True, the system fails to download or access the embedding model (e.g., all-MiniLM-L6-v2 from HuggingFace via fastembed) due to network issues, SSL errors, or permission denied on .cache directory in containerized/deployed environments, causing generate_user_intent action to fail and return generic internal error.

NeMo Guardrails dialog rails (embeddings_only=True)
View fix →
medium

refresh_ref_docs always inserts docs as new (duplicates on repeat calls). Docstore empty (`_kvstore.get_all()` {}), get_document_hash() null → no duplicate check. Affects PGVectorStore/Chroma/etc.

Docstore disabled by default with 3rd-party vector stores (e.g., PGVectorStore) to simplify storage (everything in vector DB). refresh_ref_docs relies on docstore to track inserted docs, detect changes/duplicates via get_document_hash(). Without it, treats all docs as new → inserts duplicates. Vector DBs lack APIs for node-parent mapping/diffing.

LlamaIndex VectorStoreIndex
View fix →
high

Agent crashes with "Exception: Request exceeds maximum context length (e.g., 8465 > 8192 tokens)" during summarization after conversation grows (10-15+ mins). Retries fail in loop, agent becomes unusable. Seen especially with local LLMs like KoboldCPP.

During automatic message summarization triggered by context overflow (e.g., at 70-75% of LLM context limit), the summarization prompt exceeded the LLM's context window because: 1) context_window parameter was not passed to summarization completion calls; 2) function_call=None raised ValueError in proxy; 3) persistence_manager had empty messages list; 4) summarization evicted too few messages, causing loops.

MemGPT (now letta-ai/letta)
View fix →
medium

AttributeError: 'str' object has no attribute '__name__' when calling pipeline.dumps() or similar serialization on pipeline with certain components (e.g., FaithfulnessEvaluator). YAML dump fails deep in representer on improper object types.

Haystack pipeline serialization (via YAML marshal of to_dict()) fails when components don't meet default serialization requirements: no 1:1 init param to self.attribute mapping, or contain non-YAML/JSON serializable objects (sets, callables, unhandled types) without custom to_dict()/from_dict(). YAML representer chokes on improper objects (e.g., treating str as function, AttributeError '__name__'). Specific to components like LLMEvaluator where inputs types aren't serialized properly.

Haystack Pipeline serialization (dumps(), dump())
View fix →
high

When using `poetry add metagpt` or `pip install metagpt` (sometimes): "Resolving dependencies failed" or "Because qdrant-client (1.7.0) depends on numpy (>=1.26) and metagpt (0.8.0) depends on numpy (1.24.3), qdrant-client (1.7.0) is incompatible with metagpt (0.8.0). Version solving failed." Pip may show "ResolutionTooDeep" after backtracking.

MetaGPT pins exact versions in requirements.txt (e.g., numpy==1.24.3, qdrant-client==1.7.0), but qdrant-client 1.7.0 requires numpy>=1.26.0, creating an unsolvable conflict. Poetry's strict resolver fails immediately; pip may backtrack deeply or succeed by choosing compatible transitive deps. Qdrant noted as unused and planned for removal (from examples/tools).

MetaGPT (pip/poetry install)
View fix →
low

Data quality/label quality scores do not behave as expected (e.g. not summing to 1, not probability-like, absolute values off); potentially flagging wrong % of data as issues.

No root cause identified; likely user expectation mismatch. Scores estimate relative issue likelihood (0=likely bad, 1=likely good), provably accurate per theory but not probability-calibrated like ML predictions. Rescalings in releases improve readability, not fix calibration bugs.

cleanlab.Datalab, cleanlab.classification.CleanLearning
View fix →
medium

When using Guidance templates with Phi-3 models via llama.cpp backend, repeated text insertion (e.g., deterministic template parts) causes growing leading whitespaces in text/tokens, leading to performance regression from KV-cache invalidation and incorrect rendering.

Bug in llama.cpp implementation of Phi-3 tokenizer (around commit fd5ea0f) causes accumulating whitespace tokens during repeated tokenize/detokenize cycles, mismatching Hugging Face Transformers behavior. Affects Guidance library's template insertion workflow, invalidating KV-cache and degrading performance.

Guidance templates with llama.cpp Phi-3 backend
View fix →
high

Vanna.ai generates SQL with hallucinated (non-existent) tables and columns not present in the database schema, causing query execution failures. Especially occurs in complex queries involving joins. Seen with models like GPT-3.5-turbo-16k and local LLMs like gemma2:27b.

LLM hallucinates table/column names in generated SQL because prompts lack strict constraints on using only provided schema, insufficient RAG retrieval of relevant examples, or weak model (e.g., GPT-3.5-turbo) that modifies names despite context. Common in complex queries requiring joins.

Vanna.ai (vn.generate_sql, vn.ask)
View fix →
medium

LlamaGuard incorrectly classifies benign prompts as unsafe (e.g., flags adult roleplay with age mentions like 20/27 as 'S4: Child Exploitation'), leading to unnecessary blocking/over-moderation. Default setup accepts low-confidence unsafe predictions without threshold control.[PurpleLlama Issue #74](https://github.com/meta-llama/PurpleLlama/issues/74)

LlamaGuard generates text starting with 'safe' or 'unsafe' based on first-token logits. Default implicit threshold (~0.5 probability) is tuned for high precision/low false negatives but leads to over-sensitivity in certain patterns (e.g., age mentions triggering S4 falsely due to training data bias). Low-confidence predictions (e.g., 0.679 unsafe prob) are not filtered, causing overblocking without tunable sensitivity in standard generation mode.[PurpleLlama Issue #74](https://github.com/meta-llama/PurpleLlama/issues/74)[Krnel Blog](https://krnel.ai/blog/2025-10-29-kg-guardrail-example/)

LlamaGuard (all versions/sizes)
View fix →
high

After adding an updated memory (consolidation expected), MCP server queries show new content, but Web UI and SQLite show original stale content.

Mem0's memory consolidation updates Qdrant vector DB successfully, but OpenMemory's dual-storage architecture fails to propagate changes to SQLite DB used by Web UI, causing stale data display (apparent loss).

Mem0 OpenMemory MCP/Web UI
View fix →
medium

Validation chains take seconds instead of expected sub-10ms (guards) + ~100ms (validators); overall LLM app response time degrades significantly when Guardrails enabled.

Misconfiguration causing excessive local CPU computation for ML-based validators (tens of seconds vs ms on GPU/remote), suboptimal/slow LLM choices dominating latency, synchronous execution blocking concurrency, and using large LLMs for re-validation on failures.

Guardrails AI library validators and guards
View fix →
medium

Tests pass with `pytest` but fail (IOErrors, state mismatches, timeouts) with `pytest -n auto` / xdist; intermittent/random failures across workers.

Tests share mutable state (files/DB/ports/globals) not isolated for concurrent worker processes in pytest-xdist (`-n auto`). Race conditions/overwrites occur in parallel but not sequential runs. Giskard uses pytest; no specific bug found, common ML testing issue with temp models/datasets.

Giskard test suite (pytest-based)
View fix →
medium

During tester phase or manual execution of WareHouse code: Compilation/runtime errors like ModuleNotFoundError, NameError, ImportError. Code compiles but fails functional tests or crashes (e.g., missing pygame). Tester reports bugs but iterations may not fully resolve.[ChatDev Paper](https://arxiv.org/html/2307.07924v5)

LLMs overlook basic code elements like import statements during generation (ModuleNotFound 45.76%, NameError/ImportError 15.25% each). Unclear requirements lead to placeholders/unimplemented methods. Limited context causes hallucinations/incomplete code.[ChatDev Paper](https://arxiv.org/html/2307.07924v5)

ChatDev tester agent, code execution
View fix →
low

Searches return irrelevant or degraded relevance results over time in Zep memory (search_memory or graph.search).

No specific root cause identified for named bug. General RAG relevance issues from embedding quality, graph divergence over time without refresh, or suboptimal reranker/params. Scale load caused latency spikes fixed by infra changes (specialized search infra).

Zep memory search (search_memory, graph.search)
View fix →
low

Repeated identical prediction queries return stale/old results instead of recomputing after model retrain or data update, due to cached response in local/Redis cache.

No specific root cause documented. Potential general issue where local prediction cache (for identical queries) is not automatically invalidated after model retraining or data updates, serving stale results from file-based cache.

MindsDB
View fix →
medium

Optimization fails silently or errors (e.g., AttributeError on LM.copy); metric doesn't improve after trials; stuck on bootstrap phase; no demos generated despite trainset.

Optimizers fail to generate sufficient high-quality demos if metric_threshold too high or trainset lacks diversity/volume, preventing iterative improvement. Env-specific hangs (threads/permission). Bugs in older versions (LM.copy missing).

DSPy optimizers (BootstrapFewShot, MIPROv2)
View fix →
low

Unexpected or excessive \"alert threshold drift\" notifications in Galileo model monitor, possibly false positives during normal traffic cycles or minor distribution shifts.

No documented technical root cause; likely user misconfiguration where static alert thresholds trigger falsely on natural data variations or gradual drift, as noted in Galileo's monitoring guide.

Galileo model monitor alerts
View fix →
low

Frequent data drift alerts/false positives even for minor variations, especially on large datasets; statistical tests flag insignificant changes.

Not a bug: Default statistical tests (e.g., KS p-value&lt;0.05) are highly sensitive on large datasets (&gt;1000 samples), detecting minor variations as drift. Docs and blogs explicitly note tests become "overly sensitive" vs. distance metrics.[Evidently Docs](https://docs-old.evidentlyai.com/user-guide/customization/options-for-statistical-tests)

Evidently DataDriftPreset, ColumnDriftMetric, DatasetDriftMetric
View fix →
medium

Error like "Vector dimension X does not match the dimension of the index Y" or "expected Z dimensions, not W" when upserting embeddings to vector store in AI agent workflows. [LightRAG GitHub Issue](https://github.com/HKUDS/LightRAG/issues/2119)

Voyage AI embedding models output vectors of specific dimensions (default 1024), but agent tools/vector stores are often pre-configured for different sizes (e.g. 1536 for OpenAI, 768 for some local models), causing insert/query failures when dimensions don't align. [DEV Community Post](https://dev.to/hijazi313/resolving-vector-dimension-mismatches-in-ai-workflows-47m)

embedding tools, vector stores (Pinecone, PGVector, Weaviate, Qdrant)
View fix →
high

kNN search queries with filters timeout (>60s) or severely degrade performance on large indices, while unfiltered work fine. E.g., ReadTimeoutError or response timeout in client.

Filtered approximate kNN (HNSW) requires exploring more graph nodes/candidates to find enough matching the filter, unlike regular queries where filters speed up. On large indices (200M+ docs), this leads to high compute/latency exceeding timeouts, especially with multiple segments requiring sequential graph searches.

Elasticsearch kNN search API
View fix →
low

API requests to https://api.lakera.ai/v2/guard experience sudden increases in response time (e.g. >200ms P99) when sending multiple concurrent requests or long prompts, causing agent timeouts/delays.

No specific root cause documented for named issue. Latency scales with content length (#tokens) and #detectors in policy due to processing time. Potential quota/concurrency limits in Community plan (10k req/mo). Network latency from global but not local deployment ([Lakera API docs](https://docs.lakera.ai/docs/api/guard), [pricing](https://www.eesel.ai/blog/lakera-pricing)).

Lakera Guard API
View fix →
medium

Eval scores for the same inputs and model vary across runs (e.g., score 0.85 in one run, 0.78 in the next), making it hard to detect true regressions/improvements or trust aggregate metrics.

LLM-as-a-judge scorers used in Braintrust evals are probabilistic models whose outputs vary across runs even for identical inputs due to inherent non-determinism (controlled by temperature parameter). Small differences in scores often reflect noise rather than true quality changes.

Braintrust eval scorers (LLM judges)
View fix →
medium

RuntimeError: Failed to parse docstring of query function as LMQL code, or compiler assert in compiler.py on invalid type expressions when using nested queries incorrectly (e.g., nested @lmql.query defs, [VAR: func] in where clause, class methods).

LMQL compiler parsing fails on incorrect nesting syntax (inner function definitions, wrong placement of nested calls in 'where'), unsupported attribute accesses in class methods, or deep nested indirection causing runtime issues in query execution.

LMQL
View fix →
low

Error or warning: "Zyte API extraction template mismatch" when using Zyte extraction in AI agent tools, likely low-confidence or failed structured data extraction.

No specific bug documented. Likely mismatch between requested extraction pageType/schema and actual page content, or incompatible parameters (e.g., multiple extraction fields). Zyte returns data with low metadata.probability instead of error for mismatches [Zyte API error handling](https://docs.zyte.com/zyte-api/usage/errors.html).

Zyte API extraction tool
View fix →
medium

Metrics like context_precision, faithfulness return NaN/null in evaluate() result. Warnings: RuntimeWarning: Mean of empty slice (evaluation.py:130). Inconsistent across runs/LLMs (e.g., Bedrock fails, AOAI succeeds).

Ragas Executor catches exceptions (e.g., LLM JSON parse failure missing 'verdict', timeouts) and returns np.nan by default (raise_exceptions=False). np.nanmean(empty) propagates NaN to final scores. Docs: [Ragas v0.1.21](https://docs.ragas.io/en/v0.1.21/references/evaluation.html). GitHub: [Issue #528](https://github.com/explodinggradients/ragas/issues/528).

ragas.evaluate()
View fix →
high

Mid-session connection drops/timeouts/502 errors in AI agents/scrapers using Bright Data rotating proxies, losing login state or context despite rotation config.[GoLogin Troubleshooting](https://support.gologin.com/en/articles/6627566-working-with-bright-data-proxies)

Residential peers expire after ~7min idle; rotation assigns new peers on unavailability. Without session control (`x-lpm-session` or `-session-xxx`), stateful sessions (e.g., logins) break on IP change. `const` sessions previously dropped on temp disconnects (fixed recently).[Bright Data Docs](https://docs.brightdata.com/release-notes)

fetch_url, search_web (proxies in agent tools)
View fix →
high

LLM Observability spans do not appear in the platform; no links/attribution between APM spans and LLM Obs spans when payload >1MB to llmobs-intake.datadoghq.eu.

The LLM Observability intake endpoint (llmobs-intake.datadoghq.eu) rejects or fails to process payloads exceeding 1MB, despite dd-trace library (5MB) and Agent (5MB) limits allowing larger payloads locally. This prevents span ingestion and trace context linking/attribution.

Datadog LLM Observability, dd-trace-py
View fix →
medium

Patronus AI Lynx model flags correct, faithful responses as hallucinations, leading to high false positive rate (48% implied by 52% precision). Outputs low accuracy (53%) despite excellent recall on biographical hallucination detection task.

Task-specific fine-tuning on hallucination datasets prioritizes high recall (95%) over precision (52%), making the model conservative and "trigger-happy" – it flags many correct statements as hallucinations to minimize misses. This is a deliberate trade-off in specialized training, not a bug.

Patronus AI Lynx hallucination detection model
View fix →
medium

Import errors when loading Phoenix-exported traces into Arize AX; check results/ folder for attribute formatting errors; traces fail to import or appear incomplete.

Some span attributes in Phoenix trace exports use formatting incompatible with Arize AX import expectations, causing parsing failures during import. The migration scripts cover common cases but miss some attribute structures. Fixes must be applied manually to traces.json before re-import.

Phoenix CLI px traces export, phoenix-to-arize migration scripts
View fix →
high

Run fails with \"The Actor hit an OOM (out of memory) condition. You can resurrect it with more memory to continue where you left off.\" or JS heap error. Crawler logs: \"Memory is critically overloaded\" warning despite available system RAM (e.g. 90% reported used but free -h shows plenty). Concurrency drops; new requests blocked.

Crawler (e.g. PlaywrightCrawler via Crawlee) memory autoscaler detects high JS heap usage (e.g. from high concurrency, large in-memory data like RequestList of millions of URLs, browser contexts) approaching/exceeding allocated limit (128MB-32GB per run, subscription total quota). Mismatch between JS heap and system memory reporting can trigger false positives. RequestList loads all requests into RAM; unsuitable for large crawls.

Apify Actors (esp. crawlers with Crawlee/Puppeteer/Playwright)
View fix →
medium

concurrent.futures._base.TimeoutError after 600s during feedback computation, e.g., in loops with app recording: "Run of run in &lt;Thread(TP.submit with debug timeout_*, ...)&gt; timed out after 600.0 second(s)." Often when using slow LLMs like Ollama/large models.

TruLens executes feedback functions in separate threads via ThreadPool (TP), with each having a 600s timeout (trulens.core.utils.threading.TP.DEBUG_TIMEOUT). Slow LLM responses (e.g., large models, external APIs) exceed this, raising concurrent.futures._base.TimeoutError when awaiting future.result().

TruLens feedback functions (e.g., groundedness, relevance)
View fix →
high

Garbled, incomplete, or missing text extraction from handwritten content in scanned PDFs/images; e.g., cursive notes transcribed with numerous spelling errors, wrong words, and layout issues; only printed-like fields (e.g., Customer ID) may extract correctly while others fail, as reported in GitHub [#2395](https://github.com/docling-project/docling/issues/2395).

Docling relies on general-purpose OCR engines like EasyOCR (default) and Tesseract, which have poor accuracy on cursive/messy handwriting due to lack of handwriting-specific training data and models. These engines excel on printed text but produce garbled output (e.g., 'educational' → 'eclucational') on handwritten notes, as shown in benchmarks and GitHub issues.

Docling (PDF/OCR pipeline)
View fix →
medium

In Weave UI Traces tab, nested @weave.op calls appear as separate root traces instead of hierarchical tree with proper parent-child relationships. Traces missing from UI entirely in multiprocessing setups.

Trace context (trace_id, parent_id) is stored in thread-local storage and lost when switching threads or processes without proper propagation, causing child calls to appear as root calls instead of nested under parents. Background upload threads may not complete before process exit in workers.

weave
View fix →
medium

Profile views show NaN metrics for new/evolved columns, merge failures between profiles, or conflicting resolvers warnings when logging data with schema changes. In WhyLabs, incomplete profiles lead to inaccurate drift detection or data quality alerts.

whylogs data profiles use schema-defined resolvers to compute metrics per column/data type; schema evolution (new columns, type changes) causes missing metrics (NaNs) or resolver conflicts when merging profiles with incompatible schemas, as profiles require matching column sets for merge operations.[whylogs Schema Configuration](https://whylogs.readthedocs.io/en/latest/examples/basic/Schema_Configuration.html)

WhyLabs data profile, whylogs logging
View fix →
medium

OSError when loading registered model with models:/name/version: "No such file or directory: '/custom_artifact_location/run_id/artifacts/models/'". Model registers fine but fails to load; artifacts exist under /custom_location/run_id/model but not in expected artifacts/models path.

Mismatch in artifact path expectations when using custom artifact_location in mlflow.create_experiment(). Model loads expect artifacts under a standard structure (likely relative to default tracking store), but custom locations cause MLflow to search incorrect paths like '/custom_location/run_id/artifacts/models/' where the model directory isn't found, even if registered successfully.

mlflow.pyfunc.load_model, mlflow.register_model, mlflow.create_experiment
View fix →
low

Expected webhook events (e.g., LABEL_CREATED) not received at endpoint despite Labelbox activity. No errors in Labelbox UI; silent failure.

No evidence of Labelbox-side bug. Webhooks are fire-and-forget with no documented retries (similar to GitHub); delivery fails silently if endpoint returns non-2xx, times out, or rejects. Docs confirm no ordering guarantee, no retry/monitoring mentioned.

Labelbox webhook
View fix →
medium

Embedding vectors for the same input prompt differ numerically across Ollama versions (e.g., first values: [-0.234, 1.966,...] vs [-0.091, 1.581,...]), leading to inconsistent cosine similarities, especially for longer prompts. No errors, but poor reproducibility in RAG/similarity search.

Different Ollama versions (e.g., 0.4.6 vs 0.17.0) handle the nomic-embed-text model differently, producing varying numerical embedding vectors for identical inputs, likely due to changes in model loading, quantization, or inference engine. Additionally, Nomic versions (v1/v1.5) have distinct latent spaces.

Ollama embedding API
View fix →
high

WebSocket connection opens successfully but closes abruptly (often within ~10s) with close code 1011/NET-0001; logs show "Deepgram did not receive audio data or text message within timeout"; empty transcripts; no error event before close; occurs during silence, pauses, or initial no-data periods.

Deepgram enforces a 10-second timeout requiring at least one binary audio frame or text frame (e.g., KeepAlive) after connection open; failure due to no data sent (network blocks, empty packets, MediaRecorder resume issues without headers, or silence without keepalive+audio) triggers close code 1011 with NET-0001 payload.

Deepgram live streaming API (all SDKs: JS, Python, Node.js)
View fix →
low

Reported/missing/incorrect token counts in New Relic AI monitoring dashboard for LLM prompts/responses despite LLM provider reporting different usage.

Potential mismatch occurs when `ai_monitoring.record_content.enabled=false`: agent skips content (and auto token counts); requires manual callback which may use inaccurate/inconsistent local counting logic [New Relic AI monitoring docs](https://docs.newrelic.com/docs/ai-monitoring/customize-agent-ai-monitoring/).

New Relic APM agent AI monitoring
View fix →
low

Slow or failed data ingestion into Fiddler AI monitoring, high ingestion latency, low throughput, increased error rates, stalled publish jobs.

No specific root cause for named bottleneck found. Common causes: connection timeouts (network/firewall), schema mismatches (data type/null/timestamp issues), credential problems.

Fiddler AI monitoring data ingestion/publish
View fix →
low

Custom metrics missing or drifting in experiment comparison UI, especially model-specific (e.g., gpt-4o-mini vs gpt-4o), despite logs showing values.

No specific root cause identified; general metric display issues in Opik (Comet LLM tool) exist due to model-specific logging or UI column hiding, but not matching exact query.

Comet ML experiment tracking UI
View fix →
low

Dashboard with LLM plugin panels fails to refresh automatically; data remains stale despite set refresh interval. Manual browser refresh or individual panel refresh may work temporarily.

No specific root cause documented for LLM plugin. General causes: excessive query load, short refresh intervals stressing backend, network issues, or plugin service not available (llm.enabled() false).

Grafana LLM plugin (grafana-llm-app)
View fix →
critical

Langfuse Cloud API and UI unavailable (500 errors on requests); traces/observability data not ingested (permanently lost if 5xx); prompt management failed for new deploys; UI inaccessible for monitoring.

Global Cloudflare outage caused their proxy (used by Langfuse for all API/UI traffic) to return 500 errors, blocking requests from reaching Langfuse AWS ALBs. Langfuse architecture created single point of failure via overreliance on Cloudflare for proxy/WAF/DNS/registrar.

Langfuse Cloud (observability/tracing, prompt management, UI)
View fix →
medium

SerpAPI returns stale results from previous search (e.g., "Electricical" query returns "Disabillity Law" results); `search_metadata.status: "Cached"` despite param changes. Playground/live API shows outdated data.

Bug in SerpAPI cache key generation failing to distinguish queries/params properly (e.g., different `q` or `job_id` treated as identical cache hit), violating docs ('cache only if query+all params exactly same'). Fixed in specific engines like Local Services/Jobs.

SerpAPI (Google engines: Local Services, Jobs Listing)
View fix →
low

Attempt to rollback prompt version fails due to service shutdown; API calls return errors as platform is permanently inaccessible after deadline.

No specific technical root cause identified; extensive searches found no matching bug reports. Humanloop shut down in 2025, making all features including prompt versioning inaccessible post-Sept 8, 2025.

Humanloop prompt management
View fix →
medium

Citations present in raw Perplexity API response['citations'], but missing when using LangChain ChatPerplexity; only response.content with [1][2] markers shows, no source list.[LangChain GitHub #28108](https://github.com/langchain-ai/langchain/issues/28108)

LangChain's ChatPerplexity did not pass the Perplexity-specific 'model_extra' field (containing citations array) to AIMessage; only content was forwarded.[LangChain GitHub #28108](https://github.com/langchain-ai/langchain/issues/28108)

LangChain ChatPerplexity
View fix →
low

API returns incorrect entity links (wrong diffbotUri), low confidence scores, or misses coreferences/pronouns in entity resolution for Natural Language API calls.

No documented technical root cause found; entity resolution in NLP APIs like Diffbot's is probabilistic and can fail on ambiguous pronouns, rare entities, or noisy text (inherent to ML models, 91.21% entity linking accuracy per [Diffbot benchmarks](https://www.diffbot.com/products/natural-language/)).

Diffbot Natural Language API
View fix →
medium

Crawl4AI misses dynamic/lazy-loaded content (e.g., titles, load-more items), extracts partial/wrong page HTML/markdown. Docker /crawl gets parent page content. Session reuse throws "list index out of range". Python SDK often succeeds where endpoints fail.

Incomplete page rendering/scanning in Docker /crawl endpoint and session reuse: browser context.pages empty during js_only multi-step crawls, missing dynamic content until full scan/scroll triggered. Browser manager assumes non-empty pages list.

Crawl4AI (AsyncWebCrawler, /crawl endpoint)
View fix →
low

Increased end-to-end inference latency (e.g., +100-500ms per request), higher p95/p99 tail latencies, potential timeouts under load when using promptlayer_client.run() or log_request() synchronously.

Synchronous HTTP requests to PromptLayer API for logging/tracking add network round-trip latency (~50-200ms per request) during inference critical path, especially if rate-limited or under high load. Prompt fetching adds control-plane dependency.

PromptLayer SDK (Python wrapper)
View fix →
low

Increasing memory usage over time in Browserbase sessions, leading to slowdowns, hangs, or out-of-memory errors during prolonged automation tasks.

No documented root cause for Browserbase-specific issue. General causes in browser automation: unreleased DOM elements, unclosed pages/contexts, lingering timers/event listeners, or CDP network data retention in long-running sessions.

Browserbase sessions
View fix →
medium

API request fails with timeout error after 30-40 seconds when using `render_js=true`, `js_scenario`, or `wait` parameters on JavaScript-heavy sites. Response may return incomplete HTML missing dynamic content, or error like "timeout exceeded" in SDK. Builder sees: failed scrape, partial/missing data from JS-rendered elements.

JavaScript-heavy pages require time to render dynamic content, but ScrapingBee has strict limits: base API timeout ~30s (with retries), `wait` max 35s, `js_scenario` max 40s execution. Default 2s wait is insufficient; network activity or missing elements cause premature returns or timeouts. Client SDK timeouts may also trigger if not configured higher.

ScrapingBee API (fetch_url or similar web scraping tools using ScrapingBee)
View fix →
high

Real-time transcription suddenly stops/drops out after 1.5-2 minutes (~20-25 words), WebSocket closes unexpectedly (e.g., code 3005), no further transcripts despite ongoing audio input. May show partial/final transcripts then silence. Error in on_close/on_error like "Input duration violation" or no error but session ends.

WebSocket session closure due to protocol violations: audio chunks outside 50-1000ms, sending faster than real-time, invalid messages/JSON, or session limits. In SDK (e.g., Python RealtimeTranscriber), may appear as abrupt stop after ~1.5-2min speech due to unhandled closure or stream issues. Observed in [GitHub issue #107](https://github.com/AssemblyAI/assemblyai-python-sdk/issues/107) (Jan 24, 2025).

AssemblyAI real-time transcription API / Python SDK RealtimeTranscriber
View fix →
low

Tasks remain in 'Queued' status for extended periods instead of moving to 'In Progress', delaying data labeling/annotation processing.

No specific technical root cause identified; likely high load or configuration issues in Scale's annotation/data labeling task queue system.

Scale AI platform task queue
View fix →
medium

TypeError when calling partition_pdf(filename, strategy='hi_res', infer_table_structure=True): 'get_model() got an unexpected keyword argument 'ocr_languages'' or 'partition_pdf() got multiple values for keyword argument 'infer_table_structure''. No Table elements extracted; falls back to text or CompositeElement.

Version mismatch between unstructured (0.11.4) and outdated unstructured-inference (0.7.8), causing TypeError in model loading ('ocr_languages' unexpected) and duplicate 'infer_table_structure' keyword argument when using hi_res strategy and infer_table_structure=True, likely due to internal function calls passing the arg twice.

unstructured.partition.pdf.partition_pdf (with infer_table_structure=True)
View fix →
high

In SolrCloud multi-shard collections, DenseVectorField data appears as list of strings (e.g., [\"0.1\", \"0.2\"]) instead of floats ([0.1, 0.2]) when retrieved via Admin UI or queries. KNN vector queries fail or return incorrect results. Works fine in single-shard or standalone mode.

In multi-shard SolrCloud mode prior to the fix, DenseVectorField serialized stored vectors as strings instead of native float arrays during distributed response assembly, breaking KNN queries that expect numeric vectors.

Apache Solr vector search (DenseVectorField)
View fix →
medium

Scrape/crawl returns incomplete page content (e.g., missing table of contents links), despite full content visible in browser and working with other tools like pyppeteer. Logs suggest Playwright not fully utilized even with waitFor enabled.

Insufficient wait time after page load for JavaScript to execute and render dynamic content (e.g., table of contents links), causing Playwright to scrape before DOM is fully updated. Firecrawl defaults to automatic JS handling but may require explicit delays for complex sites.

Firecrawl scrape/crawl API (JS mode)
View fix →
low

Same query returns seemingly inconsistent relevance rankings or result orders across runs or configurations; results appear less relevant than expected.

Perceived inconsistency often stems from using non-default parameters that limit result pools or from comparing across different search types (Fast/Auto/Deep) with varying latency/quality tradeoffs, rather than a technical bug.

Exa search API
View fix →
low

Tavily search API allegedly returns duplicate results despite deduplication.

No specific root cause identified; no evidence of deduplication failure in Tavily API.

Tavily search API
View fix →
high

Parsing fails with ReadTimeout error (client-side) or job status ERROR: \"EXPIRED\" for complex/large PDFs, even if under page limits (e.g., &lt;750 pages). Increasing client maxTimeout may not help if server-side timeout hits first.

LlamaParse jobs timeout after default 30 minutes of active parsing time (post-upload/queue) due to computational intensity of complex documents (e.g., many pages, tables, images, OCR, reconstruction). Service-side issues like pod degradation can also contribute. Client-side maxTimeout may trigger ReadTimeout before server completion.

LlamaParse (LlamaIndex Cloud)
View fix →
high

HTTP 400 Bad Request with error: "Unexpected role received for serving. Expected 'User' or 'Tool' but got 'Assistant'." or "Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant" when calling Mistral /v1/chat/completions endpoint.

Mistral's chat completions API strictly enforces that the final message in the 'messages' array must have role 'user' or 'tool' (or 'assistant' only if prefix parameter is enabled/true), unlike OpenAI which allows ending with 'assistant'. This prevents invalid conversation states where the API is expected to generate after a previous assistant response without new input.

Mistral chat completion API
View fix →
low

Unexpected 429 rate limit errors when exceeding undocumented (but actually documented) limits without API key or in high-volume use; long latencies reported for experimental models like v4.

No bug; this appears to be a non-issue. Rate limits for Jina Embeddings API are prominently documented with RPM/TPM tables by tier (free/paid/premium) and IP limits, as per official docs. No GitHub issues, SO questions, or community threads found reporting undocumented limits.

Jina Embeddings API integration in AI agents
View fix →
high

pymilvus.exceptions.MilvusException: (code=109, message=collection schema mismatch[collection=&lt;name&gt;]) during client.insert() after concurrent add_collection_field() operations. Reproduction: Run create/insert script while rapidly adding fields in parallel script.

Insert message consumers (pipeline/flowgraph) receive schema updates before processing pending insert messages lacking data for newly added nullable fields via add_field, causing schema validation failure during concurrent operations.

Milvus (add_collection_field, insert)
View fix →
medium

Error like 'Zilliz Cloud partition key routing error' or similar (e.g., load/search fails with OOM, timeout, or routing failure) when performing searches/queries on partition key-enabled collections without partition key filter in expr.

Collections with partition keys require explicit filtering on the partition key field in search/query requests to route to correct partitions; omitting the filter causes full collection scan instead of optimized partition routing, leading to errors like timeouts, OOM, or 'routing failed' during partition resolution.

Zilliz Cloud (Milvus-based)
View fix →
medium

Vector similarity searches return no results, wrong/irrelevant documents, or "I don't know" despite matching content exists. Exact keyword search works; other vector stores fine. Mimics index corruption but queries succeed without errors.

pgvector approximate indexes (IVFFlat/HNSW) are trained on data distribution for clustering. Indexing too early (&lt;sufficient data), wrong params (low lists/probes), or after major data shifts creates suboptimal clusters, causing poor recall/wrong results (not true corruption). Queries trade perfect recall for speed; results change post-indexing per design.

Supabase pgvector extension
View fix →
medium

Logs show proxy errors: "[error][PM03] could not wake up machine due to a timeout requesting from the machines API" or "[PM01] machines API returned an error: “machine still attempting to start”". Requests fail with timeouts/502s until machine fully starts (10-20s delay). Machine status may show 'starting' indefinitely.

Fly Proxy times out waiting for Machines API response during wake-up from suspended/stopped state due to slow resume (unreasonably slow in some cases, fixed recently) or race where proxy retries start while machine is still starting, leading to "machine still attempting to start" or PM03 timeout. Often seen with suspend feature and low-traffic apps.

fly machines (Fly.io Machines)
View fix →
medium

Gradual memory increase in Redis process during repeated JSON path queries (e.g., JSON.GET with paths), leading to high used_memory without corresponding key growth, potential OOM, not reclaimed by GC or MEMORY PURGE.

Undocumented or unreported. Possible client-side accumulation of JSONPath query results or inefficient path parsing/caching in older RedisJSON versions not freeing allocations properly.

RedisJSON module in Redis Stack
View fix →
high

Database connection failures with errors like "FATAL: remaining connection slots are reserved for non-replication superuser connections", "too many connections", or "query_wait_timeout" after queuing. Queries hang for ~2 minutes before failing. Connection spikes visible in monitoring exhaust limits.

Neon Postgres enforces max_connections limit based on compute size (e.g., 104 for 0.25 CU free tier, 7 reserved for superuser). Direct connections quickly exhaust this. Even with PgBouncer pooling (up to 10k client conns), default_pool_size (90% of max_connections, e.g., ~97 per user/db) limits concurrent active transactions per user/database pair. Serverless/AI agents often create many short-lived connections without pooling, hitting these limits.

Neon Postgres (all AI agent tools using it, e.g., bash/psql connections, ORMs like Prisma/Drizzle)
View fix →
low

Vector index shows as 'building' or 'initial sync' for extended periods (hours/days); queries using $vectorSearch return no results or perform slowly (e.g., 30s for small top-k); possible Atlas alert for 'Index Replication Lag'.

Not a confirmed bug. Long index build/rebuild times are expected for large vector collections (millions of embeddings), as Atlas must scan and index the entire collection. Insufficient memory causes slow performance post-build; replication lag in search nodes (mongot behind oplog) can stale indexes, triggering rebuilds if severe ([Atlas Search Alerts](https://www.mongodb.com/docs/atlas/reference/alert-resolutions/atlas-search-alerts/)).

MongoDB Atlas Vector Search
View fix →
medium

Document addition via add_documents hangs or takes excessively long (stalls) especially for unstructured indexes; may timeout or degrade performance with large batches.

Vespa bug (versions before 8.396.18) slows distribution of configuration files for dynamic index setting updates in unstructured indexes (Marqo 2.13.0+), causing long processing times; exacerbated by large batches exhausting resources or timeouts.

Marqo (add_documents API, document processing pipeline)
View fix →
low

Chain crashes when using RedisVectorStoreRetriever asynchronously with similarity_distance_threshold; or Invalid vector dimension error (e.g., 4 vs expected 1024) in UpstashVectorStore with FakeEmbeddings.

No specific root cause identified for a \"vector similarity threshold bug\" in Upstash Redis Vector. Related issues exist in LangChain Redis retrievers (async similarity_distance_threshold not implemented) and UpstashVectorStore dimension mismatches.

RedisVectorStoreRetriever, UpstashVectorStore
View fix →
high

WARN lance_table::io::commit] Using unsafe commit handler. Concurrent writes may result in data loss. Consider providing a commit handler that prevents conflicting writes. (data corruption/loss on concurrent writes)

Cloud object storage (S3-compatible) lacks atomic "rename if not exists" operations required for LanceDB's default commit handler, allowing concurrent writes to conflict and corrupt data or cause loss.

LanceDB (lancedb Python/Rust/JS SDKs)
View fix →
medium

Queries using HNSW index return different/wrong top-k results vs exact sequential scan (e.g. missing closest matches, lower similarity scores). Fewer results with filters. Recall drops below expectations at large scale (#M+ vectors).

HNSW provides approximate nearest neighbor search, trading perfect recall for speed via graph-based exploration limited by ef_search (default 40 candidates). Low build params (m, ef_construction) create sparse graphs with poor recall. At scale: dead tuples, post-filtering discards candidates, or memory spills degrade effective recall/QPS. Docs warn: \"you will see different results\" with ANN indexes. [pgvector docs](https://github.com/pgvector/pgvector#hnsw)

pgvector HNSW index
View fix →
high

Streaming abruptly stops midway (e.g., during 5th-12th tool call at tool-input-delta event), no console/server errors, HTTP 200. Next message prompt fails with OpenAI 400 error: \"Item 'rs_...' of type 'reasoning' was provided without its required following item.\" Common in complex React apps with long tool inputs/SQL (&gt;1000 chars).

High-volume streaming events from OpenAI (hundreds/thousands, esp. char-by-char tool-input-delta for long inputs) overwhelm React's synchronous re-rendering in useChat hook. Slow re-renders (complex apps with providers, queries) block the JS main thread, causing SSE event backlog, queue timeout, and incomplete stream (leaves dangling 'reasoning' item).

Vercel AI SDK (useChat hook)
View fix →
high

Huge memory consumption by sparse tensor fields (e.g., doubling memory usage when adding similar fields); potential OOM or spikes during feeding/indexing large datasets.

Sparse tensor attribute fields are stored entirely in memory by default without pagination support, leading to high memory usage for large datasets. `attribute: paged` was not supported for non-dense tensors until later versions, forcing full in-memory storage and potential spikes when scaling documents or adding fields.

vespa-engine tensor attributes
View fix →
medium

"Context Window Exceeded" error message appears, preventing AI responses. Context usage indicator nears 100%; AI may produce incoherent outputs or fail on short prompts due to corruption.[FlowQL Guide](https://www.flowql.com/en/blog/guides/cursor-context-window-exceeded-fix)[Cursor Forum](https://forum.cursor.com/t/message-too-long-error-on-extremely-short-prompts-e-g-1-1-severe-context-corruption/144286)

Total tokens from chat history, open files, @ references (esp. @Codebase), and codebase symbols exceed the model's hard context window limit (~200k tokens default in Cursor, e.g., for Claude 3.5 Sonnet), preventing the LLM from processing further input.[Cursor Docs](https://docs.cursor.com/en/models)[FlowQL Guide](https://www.flowql.com/en/blog/guides/cursor-context-window-exceeded-fix)

Cursor AI Chat/Agent/Composer
View fix →
medium

Queries using the new vector index fail or return no results because index state is POPULATING (not ONLINE). Other queries may slow down due to high CPU/IO from background population on large datasets. Index creation command succeeds but progress stalls (e.g. at 14%).

Neo4j schema indexes (including vector indexes) are populated asynchronously after CREATE statement succeeds. During POPULATING state, the index is unavailable for queries but database remains operational. However, for large datasets, population can be slow/resource-intensive, appearing to 'block' as it consumes CPU/IO heavily. Schema lock acquired briefly during CREATE; population uses background resources but can contend if under-resourced. Hangs may occur due to resource exhaustion or bugs fixed in later versions.

Neo4j Cypher CREATE VECTOR INDEX
View fix →
high

ThrottlingException (HTTP 429) on InvokeModel/Converse calls after initial success: 'An error occurred (ThrottlingException) when calling the InvokeModel operation: Too many requests/tokens, please wait before trying again.' Low token usage but hits RPM limit (e.g. 1/min). [AWS Docs](https://docs.aws.amazon.com/bedrock/latest/userguide/troubleshooting-api-error-codes.html) [Stack Overflow](https://stackoverflow.com/questions/79420215/aws-bedrock-throttling-exception-when-using-sonnet-3-5-sonnet)

Exceeding Amazon Bedrock's model-level service quotas: RPM (requests per minute), TPM (tokens per minute with upfront deduction of input + max_tokens, adjusted post-response; output × burndown rate e.g. 5x for Claude), TPD. Quotas shared across InvokeModel/Converse/Stream APIs; low defaults on new accounts (e.g. 1 RPM for Claude 3.5 Sonnet). [AWS Token Burndown](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas-token-burndown.html)

Amazon Bedrock Runtime (invoke_model, converse APIs)
View fix →
medium

502 Bad Gateway or 503 errors on requests to idle (scale-to-zero) endpoints; requests fail or timeout during cold start (minutes).[Hugging Face Autoscaling Docs](https://huggingface.co/docs/inference-endpoints/en/guides/autoscaling)

Scale-to-zero feature shuts down replicas after inactivity to save costs. Requests trigger auto-scale-up, but model loading/container init takes minutes; no queuing, proxy errors (502/503) during cold start.[Hugging Face Autoscaling Docs](https://huggingface.co/docs/inference-endpoints/en/guides/autoscaling)

Hugging Face Inference Endpoints
View fix →
medium

No traces appear in LangSmith dashboard despite setting tracing env vars and running LangChain code (e.g., LLM invokes). May see no errors or intermittent 'failed to batch ingest' logs. Works locally but not in cloud/deployed, or vice versa.

Environment variables not correctly set, loaded too late (cached after first LangChain import), wrong casing ('True' vs 'true'), incorrect variable names (LANGSMITH_* vs LANGCHAIN_*), or missing restart after changes. No connection/batch ingest errors often indicate silent misconfig.[GitHub #216](https://github.com/langchain-ai/langsmith-sdk/issues/216)

LangSmith tracing (langsmith-sdk)
View fix →
medium

HTTP 429 rate limit errors on writes; low throughput &lt;10k writes/s or &lt;32MB/s per namespace; backpressure errors when unindexed_bytes exceeds 2GB; high write latency despite batching.

Turbopuffer uses object storage (high write latency ~200ms per operation) with WAL batching up to 1s and group commit to achieve high throughput. Per-namespace limits enforced: 1 batch/second, 10k writes/s or 32MB/s (observed higher in some cases). Exceeding triggers HTTP 429 rate limits or backpressure when unindexed data >2GB.

Turbopuffer write API
View fix →
medium

APITimeoutError raised after ~60 seconds on long prompts (&gt;few thousand tokens), request hangs or times out during pre-fill before first token, no response despite fast models usually. Works for short prompts.

Long prompts increase pre-fill (prompt processing) time linearly, leading to Time to First Token (TTFT) exceeding the default 60-second client timeout during the KV cache computation phase. Flex tier adds rapid server-side timeouts under resource constraints. [Groq Python SDK](https://github.com/groq/groq-python) [Groq Latency Docs](https://console.groq.com/docs/production-readiness/optimizing-latency)

Groq API, langchain-groq ChatGroq
View fix →
medium

No webhook received after prediction completes; prediction succeeds in dashboard but endpoint never called (or called but fails silently). Logs may show ConnectionRefusedError/MaxRetryError in Cog. Endpoint logs empty/missing requests. Possible duplicates if partial retries.[Replicate Docs](https://replicate.com/docs/topics/webhooks/receive-webhook) [Cog Issue](https://github.com/replicate/cog/issues/2229)

Webhook delivery fails if endpoint returns non-2xx (4xx/5xx), times out (&gt;few seconds), or unreachable. Replicate retries terminal events ~1min max with exponential backoff but stops after. No retries for intermediate events. In self-hosted Cog, internal webhook sender blocks if service down.[Replicate Docs](https://replicate.com/docs/topics/webhooks/receive-webhook) [Cog Issue #2229](https://github.com/replicate/cog/issues/2229)

Replicate predictions.create webhook
View fix →
medium

Cody provides incorrect code suggestions, explanations, or answers that don't match the codebase, despite retrieval succeeding; e.g., hallucinates APIs or misses definitions. Retry buttons appear after failed first attempts.

Automatic retrieval via Sourcegraph search on query tokens + local files may rank irrelevant snippets due to keyword/semantic mismatches in complex codebases, leading to poor RAG prompts and inaccurate LLM outputs. Blogs note ongoing improvements needed for relevance.

Cody (Sourcegraph AI coding assistant)
View fix →
medium

Conversation history/context appears lost or invalid after switching LiteLLM model/provider mid-conversation (e.g. Gemini 2.5-flash to 3-pro); model ignores prior messages, asks to repeat info, or errors on missing fields like thought_signature.[LiteLLM Gemini 3 Blog](https://docs.litellm.ai/blog/gemini_3)

When switching providers/models in LiteLLM (proxy/router), conversation history loses critical provider-specific fields (e.g. Gemini thought_signature) if not appending full response.message, causing new provider to reject/misinterpret context or apply incompatible defaults/formats.[LiteLLM Gemini 3 Blog](https://docs.litellm.ai/blog/gemini_3)

LiteLLM Router, Proxy (model/provider switching)
View fix →
medium

API request fails with: "2 request validation errors: Input should be 'auto', 'none', 'any' or 'required', field: 'tool_choice.literal[...]', value: None; Input should be a valid dictionary... field: 'tool_choice.FunctionSelection', value: None" when using Fireworks as OpenAI-compatible endpoint for tool/function calling.

Client libraries serialize tool_choice as null/None or use camelCase, but Fireworks API (OpenAI-compatible) expects lowercase string literals ("auto", "none", "any", "required") and rejects invalid values during request validation.[Zed GitHub issue #35434](https://github.com/zed-industries/zed/issues/35434)

Fireworks AI chat completions API (OpenAI compatible)
View fix →
high

API POST /prediction to chatflow returns 504 Gateway Time-out after 30-60s for long workflows. Flow completes successfully (visible in Executions tab/Langsmith), but no response received by client (e.g., Postman, backend code). Client sees timeout despite long client-side timeout set.

Load balancer/gateway timeout on synchronous HTTP API responses when workflows exceed ~30-60s (e.g., chained agents + LLM taking 5+min). Flow executes fully server-side but client connection drops before result returns. Affects Flowise Cloud primarily.

Flowise Cloud API prediction endpoint
View fix →
high

Streaming response (text/audio) cuts off mid-sentence mid-generation; stream ends with turnComplete (Live API) or finish_reason STOP/MAX_TOKENS; response incomplete (e.g., malformed JSON field, no final period); occurs randomly/ consistently in previews, worse with long prompts/tools/JSON mode.

Model or API sends premature turnComplete / finish_reason STOP / MAX_TOKENS in streaming (generate_content_stream, Live API) despite room in limits. Often due to default 'auto' thinking_budget exhausting allocation on internal reasoning, especially in gemini-2.5-flash previews with structured/JSON outputs, tools, or long inputs. Client libs (langchain) may omit configs; preview model bugs in audio/text cutoff.

Google Gemini API (generate_content_stream, Live API)
View fix →
medium

First requests or after idle: high tail latency/p99 (seconds-minutes). Dashboard shows long 'pending' (queueing) or uneven invocation times (first per container slow).

Cold starts occur when no warm container available: queueing delay waiting for boot (~1s Modal base + user code like imports/model loads) + first-invocation init (e.g. model to memory, sequential IO). Subsequent calls reuse warm containers.

Modal functions/classes
View fix →
low

Rerank scores appear inconsistent, e.g., do not sum to 1 across documents or vary unexpectedly across queries, leading to unreliable thresholding or ranking fusion.

Likely misunderstanding: Cohere Rerank outputs independent relevance scores normalized to [0,1] per document-query pair, but query-dependent and not summing to 1 across documents, leading to perceived inconsistency when assuming probabilistic softmax normalization.

Cohere Rerank API, LangChain CohereRerank
View fix →
high

Agent crashes with "Agent failed before reply: Unknown model: openrouter/{model}" when primary model is unrecognized/deprecated. Configured fallback models (e.g., openrouter/auto) are ignored, no automatic switch.

Model resolution throws plain Error for unknown model IDs instead of FailoverError. Fallback handler runWithModelFallback() only catches FailoverError via coerceToFailoverError(), skipping plain Errors and crashing without trying fallbacks.

OpenClaw OpenRouter integration (model fallback routing)
View fix →
medium

Requests succeed despite exceeding configured rate limits when rate limiter subsystem fails (e.g., Cloudflare Durable Objects error, KV failure), leading to unexpected overuse/costs instead of expected 429 errors.

Helicone's rate limiter uses fail-open strategy: errors in rate limit checks (network timeout, Durable Object crash, KV read failure) silently allow requests to pass through via try/catch blocks and failureMode: \"fail-open\" config, prioritizing availability over strict enforcement. Code in worker/src/lib/HeliconeProxyRequest/ProxyForwarder.ts implements this with empty catch block.

Helicone proxy/gateway rate limiter
View fix →
medium

Request timeout triggered and aborts streaming even after gateway has started receiving response chunks from provider.

Gateway retry handler used AbortSignal.timeout() which continues timing out even after streaming starts and chunks are received from the provider, located in retryHandler.ts.

Portkey AI Gateway
View fix →
medium

Batch job status remains 'IN_PROGRESS' for extended periods (beyond 24 hours), no progress updates, job does not complete or fail, appearing stalled in the queue despite valid input.

Batch jobs are processed asynchronously using spare capacity during off-peak times on a best-effort basis within ~24 hours. Delays/stalls occur with large/complex batches, popular models with high queue loads, or temporary capacity constraints, causing IN_PROGRESS to exceed expected times.

Together AI Batch API
View fix →
medium

Claude ignores screenshot read requests or claims \"I can't access the local file path /var/folders/.../Screenshot.png\" and suggests uploading externally or describing the image, despite having the capability.

Claude model fails to automatically recognize and invoke its built-in Read() tool for local screenshot file paths (especially drag-and-drop or temp paths). It hallucinates lack of filesystem access instead of using the tool it has.

Claude Code CLI Read tool (Computer Use related)
View fix →
high

Agents perform unauthorized actions (data exfil, malicious code exec) after processing malicious webpages/emails, despite safe system prompts.

LLMs in AutoGen agents treat untrusted external content (e.g., scraped webpages) as instructions, allowing override of system prompts via hidden malicious text, especially in web agents like Magentic-One WebSurfer.

AutoGen (multi-agent web agents)
View fix →
high

Unauthenticated requests succeed (e.g., curl http://weaviate:8080/v1/meta returns data instead of 401) despite setting AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false' in config.

Weaviate enables anonymous access by default when no other authentication methods are explicitly configured. Setting AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false' alone does not suffice if API key and OIDC are also disabled/not set - the system interprets this as fallback to anon auth.

Weaviate vector database
View fix →
high

language_server_linux_x64 process memory grows unbounded from ~3.5 GB to 60+ GB RSS in hours during normal Cascade usage over Remote SSH, exhausting all RAM/swap (e.g., 64 GB + 16 GB), causing system hard freeze (OOM without killer logs, requires power cycle). Growth accelerates non-linearly; CPU pegged at 100% idle, load avg >1000.

Language server fails to garbage collect index/embedding cache files in implicit/ directory (.tmp files accumulating to 36 GB) and Cascade conversation history in cascade/ directory (8.4 GB of .pb/.tmp files). On startup, server loads entire bloated cache into Go heap without eviction or consolidation, causing unbounded growth during normal Cascade usage over hours.

Windsurf Cascade agent (via language server)
View fix →
medium

Agent fails file ops with "PermissionError: [Errno 13] Permission denied", "EACCES: permission denied", or similar. May loop retrying (e.g., rm → sudo rm → find -delete → Python shutil.rmtree). Console/logs show denied access to /nix/store/*, /data/cache, /root. Build/deploy halts [GitHub #571](https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/571), [Rajesh Dhiman blog](https://www.rajeshdhiman.in/blog/fix-replit-errors).

Replit runs Agent-generated code as non-root user in sandboxed container. Code attempting system paths (/nix/store, /root, /data) or privileged ops (sudo, rm -rf protected dirs) triggers permission denied/EACCES. Agent interprets as solvable, retries creatively (sudo rm, shutil.rmtree) instead of respecting boundaries [Hacker News analysis](https://news.ycombinator.com/item?id=47161209).

Replit Agent
View fix →
medium

HTTP 400 error with code 'content_filter' and inner_error 'ResponsibleAIPolicyViolation'. Message: "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry." content_filter_results show filtered=true in a category (e.g., sexual/high) despite safe content. Response body empty or finish_reason='content_filter' in chat completions.

Azure OpenAI's built-in content safety filters (powered by Azure AI Content Safety classifiers) use severity-based detection across categories like sexual/violence/hate/self-harm. False positives occur when models misclassify safe content (e.g., family photos as 'sexual/high', neutral French phrases as 'sexual/medium', technical terms triggering self-harm) due to conservative thresholds, language nuances, or image processing sensitivities.

Azure OpenAI chat completions, vision models (e.g., PDF/image processing)
View fix →
high

GitHub Copilot autocomplete, chat responses, and agent mode become extremely slow (10+ minutes for simple suggestions, slow token streaming/"painting" back responses); persists across restarts/projects; makes tool unusable. [microsoft/vscode-copilot-release #6489](https://github.com/microsoft/vscode-copilot-release/issues/6489)

Accumulation of bloated or corrupted temporary cache files in VS Code, leading to performance degradation over time as suspected memory leaks or cache buildup interfere with Copilot's response processing and chat rendering. [microsoft/vscode-copilot-release #6489](https://github.com/microsoft/vscode-copilot-release/issues/6489)

GitHub Copilot VS Code extension
View fix →
high

Runtime ValueError during plan execution: \"Parameter input is expected to be parsed to &lt;class 'int'&gt; but is not.\" Traceback shows invalid literal for int() on unexpected string like date ('Monday, 17 June, 2024'). Plan executes wrong prior function, mismatching parameters for target function.

In Python SequentialPlanner, LLM generates XML plan with sequential function calls sharing a global context. Hallucinated/unnecessary functions (e.g., TimePlugin-date) execute first, polluting context variables (e.g., setting 'input' to date string). Later functions (e.g., MathPlugin-Subtract expecting int) fail type parsing on mismatched values.

SequentialPlanner (Python), kernel_function_from_method parameter gathering
View fix →
high

After restarting kernel/script/process, queries on persisted Chroma DB return empty results (`[]`), despite `collection.count()` or `peek()` showing data initially. Parquet files (`chroma-*.parquet`) exist but appear empty or outdated on query.

Using ephemeral `chromadb.Client()` (in-memory only) instead of `PersistentClient(path=...)` (disk-persisting). Data vanishes on process/script/kernel restart as it's never written to disk. Mismatching client types/paths on load exacerbates.

chromadb PersistentClient, langchain_chroma.Chroma
View fix →
high

Under high webhook load: 90s+ delays with executions stuck in 'new' status; workflows unresponsive (no logs/errors, needs restart); production URLs return Internal Server Error (tests work); trigger nodes fail to register; queue length >50, latency >300ms.

By design, n8n processes incoming webhooks in parallel with no default limit on concurrent production executions or rate limiting on receipt, causing Node.js event loop thrashing, unresponsiveness, and failures under traffic spikes. Concurrency controls only queue post-receipt executions, not preventing initial overload.

n8n webhook node, production executions
View fix →
high

Operator freezes during browser navigation tasks (e.g., form filling), shows \"Something went wrong. Please try again.\" after seconds; retry fails; persists across browsers/OS/modes.

ComputerToolCallFailed error in Operator's internal tool calls during typing/input actions in browser navigation, due to a backend issue affecting ~0.2% of interactions (per OpenAI).

OpenAI Operator (ChatGPT agent browser tool)
View fix →
high

Matrix connectivity stops (no messages received); `openclaw plugins list` shows 'matrix: error'; `openclaw gateway status` warns 'duplicate plugin id detected; later plugin may be overridden'; Control UI unstable.

OpenClaw 2026.2.26 update bundled Matrix plugin as stock, missing `@vector-im/matrix-bot-sdk` dependency + conflicting with users' pre-existing local `~/.openclaw/extensions/matrix/` (duplicate plugin ID, no auto-migration).

OpenClaw plugins, Matrix plugin, gateway tools
View fix →
high

Arbitrary code execution (e.g., os.system('echo pwnd &gt; /tmp/pwnd.txt')) during graph.invoke() or checkpoint load when processing malicious checkpoint containing surrogate-triggered JSON payload with constructor like {\"lc\":2,\"type\":\"constructor\",\"id\":[\"os\",\"system\"],\"kwargs\":{\"command\":\"...\"}}. No explicit error; silent RCE. Affects apps persisting untrusted data with default/explicit JsonPlusSerializer. [GHSA-wwqv-p2pp-99h5](https://github.com/langchain-ai/langgraph/security/advisories/GHSA-wwqv-p2pp-99h5)

JsonPlusSerializer (default for checkpointing) falls back to unsafe 'json' mode (lc=2, type='constructor') on msgpack failure (e.g., illegal Unicode surrogates like \\ud800). This mode allows arbitrary Python object reconstruction via constructor format, enabling execution of functions like os.system during deserialization of untrusted payloads. [GHSA-wwqv-p2pp-99h5](https://github.com/langchain-ai/langgraph/security/advisories/GHSA-wwqv-p2pp-99h5)

LangGraph checkpoint (langgraph-checkpoint &lt; 3.0)
View fix →
medium

upsert(vectors) call succeeds without exception but fewer vectors appear in index than expected (e.g., query or describe_index_stats shows missing vectors). No error logged. Or intermittent timeout with no response.

Pinecone enforces a hard 2MB limit per upsert request. Exceeding it triggers a 429 error with message "Your request is larger than the maximum supported size - 2MB". Client libraries or wrappers (e.g., LangChain) may fail to auto-batch large inputs, or users send oversized batches. Not silent—errors are raised—but appears "silent" if not handling/logging errors or if async fire-and-forget.

Pinecone upsert
View fix →
medium

In group chats with speaker_selection_method=&quot;round_robin&quot;, agents speak out of expected order (recently created agents first), causing wrong message sequence, e.g., user_proxy speaks immediately and terminates chat prematurely.

GroupChat internally sorts agents by creation date (descending) instead of addition/insertion order when selecting next speaker for &quot;round_robin&quot; method, leading to unexpected message sequence.

autogen.GroupChat
View fix →
medium

APIConnectionTimeoutError thrown after ~10-60min on long tool calls; requests fail silently if non-streaming with large max_tokens; retries attempted but may fail.[@anthropic-ai/sdk npm](https://www.npmjs.com/package/@anthropic-ai/sdk)

Default 10min timeout (dynamic up to 60min for large max_tokens non-streaming) too short for long-running tool use requests; networks drop idle connections; SDK proactively terminates expected long (&gt;10min) non-streaming requests before completion.[@anthropic-ai/sdk npm](https://www.npmjs.com/package/@anthropic-ai/sdk)

Anthropic Messages API, tool calls
View fix →
critical

Authenticated users create/modify workflows with malicious expressions like {{ {}[[\"__proto__\"]].polluted = 23 }} or {{ {}[[\"toString\"]][[\"constructor\"]](\"return process\")(process) }}. When executed, bypasses sandbox, leading to prototype pollution, arbitrary JS execution, or OS commands via spawn_sync, without error but unexpected server-side effects like file reads, command execution.

The isSafeObjectProperty sanitization function in packages/workflow/src/utils.ts assumes property keys are strings via TypeScript annotation but lacks runtime typeof property === 'string' check. Attackers pass non-string values like [\"__proto__\"] arrays in bracket notation expressions, bypassing Set.has() strict equality check against string blacklist. Downstream obj[property] coerces to string, accessing dangerous properties like __proto__ for prototype pollution or process for RCE via Function constructor or process.binding('spawn_sync').

n8n workflow expressions
View fix →
medium

API module fails with "ModuleTimeoutError" / "Timeout error" / "The operation timed out" after ~40s. Bubble shows red error icon. Scenario may pause scheduling after repeated failures or store as incomplete execution.[Make Help Center](https://help.make.com/types-of-errors)[Make Community](https://community.make.com/t/timeout-error/37183)

Make.com API modules enforce a 40-60 second response timeout. External API servers that delay responses beyond this (due to overload, complex queries, temporary issues) cause ModuleTimeoutError, as the module waits but receives no timely response.[Make Help Center](https://help.make.com/types-of-errors)

Make.com API module
View fix →
critical

Routine requests like 'Summarize this page/document' or 'Research <topic>' silently trigger unauthorized actions: Gmail exfiltration, arbitrary code execution/root shell, internal tool exposure (e.g., code-server), secret leaks, without warnings or user awareness. [AuraLabs Report](https://aurascape.ai/resources/auralabs-research/silentbridge-zero-click-agent-takeover-meta-manus/)

Systemic failure to isolate untrusted content ingestion (web pages, search results, documents containing hidden prompt injections) from high-privilege tools/connectors (Gmail, code exec, shell), over-privileged sandbox (root sudo), no prompt injection defenses, leading to zero-click indirect prompt injection enabling data exfil, RCE. [AuraLabs Report](https://aurascape.ai/resources/auralabs-research/silentbridge-zero-click-agent-takeover-meta-manus/)

Manus AI agent (web browsing, Gmail connector, code execution tools, sandbox)
View fix →
high

Crew execution hangs indefinitely with repeating logs like \"Entering new CrewAgentExecutor chain... Action: Delegate work to coworker\" for the same agents/tasks. High CPU/memory usage, no final output. Stops when allow_delegation=False but loses delegation functionality.[GitHub #330](https://github.com/joaomdmoura/crewAI/issues/330)

Agents with allow_delegation=True continuously delegate tasks back-and-forth (e.g., writer to researcher repeatedly) without reaching termination condition, exceeding default iteration limits or lacking timeouts. Common in non-hierarchical setups or poor agent instructions.[GitHub #330](https://github.com/joaomdmoura/crewAI/issues/330)

CrewAI Agent delegation (allow_delegation=True)
View fix →
high

Attempt to create_collection() fails with \"Wrong input: Can't create collection with name [NAME]. Collection data already exists at ./storage/collections/[NAME]\", despite collection_exists() returning False and GET /collections not listing it. Get collection info returns \"Collection doesn't exist!\".[GitHub Issue #4564](https://github.com/qdrant/qdrant/issues/4564)

Race condition during concurrent collection create/delete operations combined with snapshots, or incomplete recovery after Qdrant restart/pod restart where filesystem data persists but is not loaded into memory/metadata (TOC), causing inconsistency between storage and runtime state.[GitHub Issue #4564](https://github.com/qdrant/qdrant/issues/4564)

Qdrant create_collection
View fix →
low

Agent code creates symlink in sandbox pointing to host file (e.g., `ln -s /etc/passwd leak`), then reads/writes it. If path check is lexical only, operation succeeds but targets outside sandbox — potentially leaking data or allowing writes if shared mounts exist.

No specific E2B symlink escape documented. General root cause in sandboxes: lexical path checks (string prefix) before file operations that follow symlinks (realpath/read), allowing escape if symlink target resolves outside sandbox. E2B's microVM isolation prevents host escape.

E2B sandbox bash tool, file ops
View fix →
high

MCP filesystem server grants access to files/directories outside configured allowed directories if the requested path shares a prefix with an allowed directory (e.g., requesting /private/tmp/allow_dir_secrets succeeds if /private/tmp/allow_dir is allowed), leading to unauthorized reads/writes and potential data leaks.

Flawed path validation using simple string prefix matching (normalizedRequested.startsWith(allowedDir)) instead of proper canonicalization and containment checks, allowing access to sibling directories sharing the same prefix (e.g., /allowed/dir vs /allowed/dir-evil).

MCP Filesystem server (@modelcontextprotocol/server-filesystem), read_file, list_directory, write_file tools
View fix →
high

JSONDecodeError when parsing tool_call.function.arguments, e.g. "Expecting property name enclosed in double quotes: line 1 column 112 (char 111)" with args like "{'query': 'ha', 'num': 5}".

OpenAI models occasionally generate tool call arguments using Python dict syntax with single quotes (e.g. {'key': 'value'}) instead of strict JSON with double quotes ({\"key\": \"value\"}), causing json.loads to fail with JSONDecodeError when client code parses message.tool_calls[0].function.arguments.

OpenAI Chat Completions API (tool_calls parsing in client code)
View fix →
high

When uploading documents to knowledge base and clicking Save & Process: documents stuck in Queuing/0%, fail with errors like \"Provider langgenius/azure_openai/azure_openai does not exist\" or 500 errors; no progress, status changes to error without verbose details (especially self-hosted Docker).

Missing or incorrect embedding model provider configuration/credentials (e.g., Azure OpenAI), absent environment variables in self-hosted worker containers (PLUGIN_DAEMON_KEY etc.), insufficient DB connections, leading to ProviderNotInitializeError or unauthorized plugin access during document embedding step of indexing.

Dify Knowledge Base indexing
View fix →
high

Agent or chain executes destructive SQL like DROP TABLE from seemingly innocent prompts, e.g., db_chain.run(\"Drop the employee table\") drops tables. Errors or data loss occur unexpectedly from user inputs.[GitHub Issue #5923](https://github.com/langchain-ai/langchain/issues/5923)

SQLDatabaseChain (and similar agent tools) executed raw, unvalidated SQL queries generated by LLMs from natural language prompts. Malicious prompts tricked the LLM into generating harmful SQL (e.g., \"Drop the employee table\"), leading to prompt injection attacks causing arbitrary SQL execution.[GitHub Issue #5923](https://github.com/langchain-ai/langchain/issues/5923)

LangChain SQLDatabaseChain, SQL agents (create_sql_agent)
View fix →
high

Node produces invalid state (e.g., None where List[str] expected); checkpoint saves it. Later, graph.get_state() or get_state_history() raises ValidationError. Checkpoints become permanently unrecoverable, breaking graph invocation/resume.

LangGraph validates node input when preparing next tasks but skips validation of node output after execution, allowing invalid state (e.g., None in required List[str]) to be saved to checkpoints. Retrieval (get_state_history) then fails with ValidationError, permanently corrupting the checkpoint.

LangGraph checkpointers (MemorySaver, PostgresSaver, etc.)
View fix →
high

When using `collection.get(include=['embeddings', 'metadatas'], where={...})` or non-existent `ids` with a non-matching metadata filter, `ids` and `metadatas` return empty lists, but `embeddings` returns *all* embeddings in the collection (e.g., 153MB for 4000 docs). Causes high memory usage, slowness, and incorrect results.

Bug in the metadata join logic during `get()` queries with `where` filters that match no entries, causing all embeddings to be returned despite empty `ids` and `metadatas`. Primarily affected server/REST API mode.

ChromaDB get() method
View fix →
medium

Tool execution fails with authentication errors (e.g., invalid/expired credentials). Connected account status shows EXPIRED. Automatic refresh attempts fail, preventing agent/tool access to services like Gmail, GitHub, etc.

Composio automatically attempts OAuth token refresh before expiry, but marks connection EXPIRED after failures due to: user revokes access, OAuth app deleted/disabled, refresh token expiry (e.g., Google test apps), provider revocation, or repeated transient errors (outages).

Composio connected accounts, OAuth tools
View fix →
medium

APIRequestComponent input fields reset to empty when re-entering/reloading a flow, despite values being saved to DB initially. Observed in Langflow 1.3.0 Docker.

Component-specific state persistence failure in APIRequestComponent where fields are saved to DB but not reloaded properly on flow re-entry; values disappear from DB on reopen, likely due to improper state serialization/loading or reset logic (e.g., resetFlow function).

LangFlow APIRequestComponent
View fix →
critical

Devin, tasked to investigate a GitHub issue or website, unexpectedly runs shell commands (e.g., curl with env vars), browses attacker URLs with encoded secrets, renders malicious Markdown images, or executes binaries, leaking secrets/environment variables to third-party servers without explicit user instruction.

Devin processes untrusted inputs (e.g., GitHub issues, websites) without sanitization, allowing indirect prompt injection to hijack control. Powerful tools like unrestricted Shell (terminal commands with internet access) and Browser enable execution of curl/wget for exfiltration, binary downloads/RCE, or URL encoding of secrets. No structural safeguards beyond model refusal, which fails reliably.

Devin AI (Shell, Browser tools)
View fix →
high

Sessions crash/interrupt: context, history, background shells lost; must re-explain everything. Or `claude --resume` shows no recent sessions despite intact JSONL files (index corruption).

No robust crash-proof persistent storage; abrupt terminations (e.g., SIGKILL/OOM) desync `sessions-index.json` from JSONL files. Designed stateless between independent sessions.

Claude Code
View fix →
medium

Errors like "Scripting payload too large", "You cannot return more than 250 items from a Code action", process timeouts, or high memory usage in logs. Zap step fails or is throttled.

Zapier Code steps run in AWS Lambda environment with hard limits: 6MB I/O payload (code + data), 256MB RAM max (128MB free), plan-specific timeouts (up to 2min Enterprise), and 250-item output limit for actions. Exceeding triggers errors to protect infrastructure.

Code by Zapier (JavaScript/Python)
View fix →
critical

Users encounter a provisioning failure and receive an error response exposing CrewAI's internal GitHub token in JSON: {\"id\": [ProvisionID], \"repo_clone_url\":\"https://x-access-token:ghu_Ahd....\"}.

Improper exception handling during machine provisioning failures returned full JSON payloads containing an internal high-privilege GitHub token in the repo_clone_url field without sanitization. Triggered via GET /crewai_plus/deployments/[deployment_id]/check_provision_status on error.

CrewAI platform (crewai_plus deployments)
View fix →