medium severityPatronus AI Lynx hallucination detection model

Patronus AI Lynx model flags correct, faithful responses as hallucinations, leading to high false positive rate (48% implied by 52% precision). Outputs low accuracy (53%) despite excellent recall on biographical hallucination detection task.

Root cause

Task-specific fine-tuning on hallucination datasets prioritizes high recall (95%) over precision (52%), making the model conservative and "trigger-happy" – it flags many correct statements as hallucinations to minimize misses. This is a deliberate trade-off in specialized training, not a bug.

Patronus AILynxhallucination detectionfalse positiveprecision-recall tradeoffLLM-as-judge

Citations