medium severityragas.evaluate()

Metrics like context_precision, faithfulness return NaN/null in evaluate() result. Warnings: RuntimeWarning: Mean of empty slice (evaluation.py:130). Inconsistent across runs/LLMs (e.g., Bedrock fails, AOAI succeeds).

Root cause

Ragas Executor catches exceptions (e.g., LLM JSON parse failure missing 'verdict', timeouts) and returns np.nan by default (raise_exceptions=False). np.nanmean(empty) propagates NaN to final scores. Docs: [Ragas v0.1.21](https://docs.ragas.io/en/v0.1.21/references/evaluation.html). GitHub: [Issue #528](https://github.com/explodinggradients/ragas/issues/528).

ragasNaNnp.nanJSON parsingLLM outputraise_exceptionsevaluation.py

Citations