Table of Contents
Fetching ...

MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions

Hui Min Wong, Philip Heesen, Pascal Janetzky, Martin Bendszus, Stefan Feuerriegel

TL;DR

This work introduces MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning to support diagnostic decision-making and reduces diagnostic errors by ~27 percentage points (p.p.) compared to a standard single-shot LLM baseline.

Abstract

Large language models (LLMs) are increasingly used for diagnostic tasks in medicine. In clinical practice, the correct diagnosis can rarely be immediately inferred from the initial patient presentation alone. Rather, reaching a diagnosis often involves systematic history taking, during which clinicians reason over multiple potential conditions through iterative questioning to resolve uncertainty. This process requires considering differential diagnoses and actively excluding emergencies that demand immediate intervention. Yet, the ability of medical LLMs to generate informative follow-up questions and thus reason over differential diagnoses remains underexplored. Here, we introduce MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning to support diagnostic decision-making. Specifically, MedClarify computes a list of candidate diagnoses analogous to a differential diagnosis, and then proactively generates follow-up questions aimed at reducing diagnostic uncertainty. By selecting the question with the highest expected information gain, MedClarify enables targeted, uncertainty-aware reasoning to improve diagnostic performance. In our experiments, we first demonstrate the limitations of current LLMs in medical reasoning, which often yield multiple, similarly likely diagnoses, especially when patient cases are incomplete or relevant information for diagnosis is missing. We then show that our information-theoretic reasoning approach can generate effective follow-up questioning and thereby reduces diagnostic errors by ~27 percentage points (p.p.) compared to a standard single-shot LLM baseline. Altogether, MedClarify offers a path to improve medical LLMs through agentic information-seeking and to thus promote effective dialogues with medical LLMs that reflect the iterative and uncertain nature of real-world clinical reasoning.

MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions

TL;DR

This work introduces MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning to support diagnostic decision-making and reduces diagnostic errors by ~27 percentage points (p.p.) compared to a standard single-shot LLM baseline.

Abstract

Large language models (LLMs) are increasingly used for diagnostic tasks in medicine. In clinical practice, the correct diagnosis can rarely be immediately inferred from the initial patient presentation alone. Rather, reaching a diagnosis often involves systematic history taking, during which clinicians reason over multiple potential conditions through iterative questioning to resolve uncertainty. This process requires considering differential diagnoses and actively excluding emergencies that demand immediate intervention. Yet, the ability of medical LLMs to generate informative follow-up questions and thus reason over differential diagnoses remains underexplored. Here, we introduce MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning to support diagnostic decision-making. Specifically, MedClarify computes a list of candidate diagnoses analogous to a differential diagnosis, and then proactively generates follow-up questions aimed at reducing diagnostic uncertainty. By selecting the question with the highest expected information gain, MedClarify enables targeted, uncertainty-aware reasoning to improve diagnostic performance. In our experiments, we first demonstrate the limitations of current LLMs in medical reasoning, which often yield multiple, similarly likely diagnoses, especially when patient cases are incomplete or relevant information for diagnosis is missing. We then show that our information-theoretic reasoning approach can generate effective follow-up questioning and thereby reduces diagnostic errors by ~27 percentage points (p.p.) compared to a standard single-shot LLM baseline. Altogether, MedClarify offers a path to improve medical LLMs through agentic information-seeking and to thus promote effective dialogues with medical LLMs that reflect the iterative and uncertain nature of real-world clinical reasoning.
Paper Structure (23 sections, 4 equations, 17 figures, 3 tables)

This paper contains 23 sections, 4 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Overview of MedClarify, an AI agent for medical diagnosis that optimizes the question selection strategy.a, The system generates a set of differential diagnoses with corresponding confidence scores, which are then adjusted using a Bayesian update that combines knowledge from old and new evidence. Based on this improved differential diagnosis, MedClarify selects the optimal follow-up question through optimizing information gain, thus reducing uncertainty. This question is then used to interview the patient, and the patient's response is integrated into the patient's case. This iterative diagnostic refinement process is repeated until specific thresholds are met, and eventually outputs a final diagnosis. b, The question selection strategy and technical overview of MedClarify.
  • Figure 2: Example with proactive, case-specific follow-up questions for medical diagnosis. The figure presents an example patient case with clinical information (top), as well an the diagnostic dialogue generated by the naïve multi-turn baseline system (middle) and by MedClarify (bottom) for the same patient case. Additional dialogues are in Extended Figure \ref{['exfig:ex-effective-ex1']} and \ref{['exfig:ex-effective-ex2']}.
  • Figure 2: Example with proactive, case-specific follow-up questions for medical diagnosis. The figure presents an example patient case with clinical information (top), as well an the diagnostic dialogue generated by the naïve multi-turn baseline system (middle) and by MedClarify (bottom) for the same patient case.
  • Figure 3: Overview of Bayesian update and question selection in MedClarify.a, Question selection strategy in MedClarify can shift LLM focus to alternative condition groups, encouraging to rule out a wider range of diseases for a more reliable final diagnosis. The process begins by generating targeted questions for each predicted disease to refute the highest-probability condition and confirm low-ranked alternatives, plus an additional explorative question. Afterwards, the system selects the question with the highest information gain. In this example, although the initial hypothesis was related to ICD-11 Chapter 13 (disease of the digestive system), MedClarify selects a question that effectively confirms Chapter 18 (pregnancy, childbirth, or the puerperium), leading to the correct diagnostic pathway, which the baseline system would have missed. b, Bayesian update prevents static entropy by incorporating historical diagnoses and temperature scaling.
  • Figure 3: Example with proactive, case-specific follow-up questions for medical diagnosis. The figure presents an example patient case with clinical information (top), as well an the diagnostic dialogue generated by the naïve multi-turn baseline system (middle) and by MedClarify (bottom) for the same patient case.
  • ...and 12 more figures