team

Team using agents for clinical trial matching

GPT-4oQwen 2.5LLaMA 3.1Gemma 3Mistral 7Ball-MiniLM-L12-v2 Sentence-TransformerPerplexity AIA10G GPUs

Stack tools8

AddedMar 2026

StatusPublished

“92.2% accuracy and 85.4% F1 on N2C2 dataset (3744 pairs); 98.7% accuracy and 98.2% F1 on ClinicalTrials dataset (30 criteria, 10 patients); average +7% F1 improvement over baselines”
team

Why they built it

LLMs struggle with clinical trial eligibility due to knowledge gaps, hallucinations, and sensitivity to criterion phrasing variations, leading to inaccurate patient matching.

What worked

Complementary modules (+6% F1 reasoning, +5% augmentation); consistent 7% avg improvement; strong gains on complex criteria (e.g. +12% acc on CREATININE); viable local deployment with open-source models (Qwen 2.5 14B: 91.5% acc)

What broke or was painful

Router Agent never used external retrieval/online search due to dataset limitations; minor reasoning errors in conservative assessments and unit miscalculations (mitigated by Matching Agent); smaller models (<7B) insensitive to augmentation

The result

92.2% accuracy and 85.4% F1 on N2C2 dataset (3744 pairs); 98.7% accuracy and 98.2% F1 on ClinicalTrials dataset (30 criteria, 10 patients); average +7% F1 improvement over baselines

References

https://arxiv.org/abs/2411.14637