medium severityPatronus AI Lynx hallucination detection model
Patronus AI Lynx model flags correct, faithful responses as hallucinations, leading to high false positive rate (48% implied by 52% precision). Outputs low accuracy (53%) despite excellent recall on biographical hallucination detection task.
Root cause
Task-specific fine-tuning on hallucination datasets prioritizes high recall (95%) over precision (52%), making the model conservative and "trigger-happy" – it flags many correct statements as hallucinations to minimize misses. This is a deliberate trade-off in specialized training, not a bug.
Patronus AILynxhallucination detectionfalse positiveprecision-recall tradeoffLLM-as-judge