CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine

Kevin H. Guo, Chao Yan, Avinash Baidya, Katherine Brown, Xiang Goa, Juming Xiong, Zhijun Yin, Bradley A. Malin·ArXiv cs.CL·AI·May 5, 2026

arXiv:2605.01011v1 Announce Type: new Abstract: Medical large language model (LLM) evaluations rely on simplified, exam-style benchmarks that rarely reflect the ambiguity of real-world medical inquiries. We introduce the CLinical Evaluation of Ambiguity and Reliability (CLEAR) framework, which assesses how decision-space presentation, ambiguity, and uncertainty affect LLMs' reasoning on medical benchmarks. CLEAR systematically perturbs (1) the number of plausible answer options, (2) the presence...

Read full article →

CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine

Related Articles