Table of Contents
Fetching ...

Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning

David Bani-Harouni, Chantal Pellegrini, Ege Özsoy, Matthias Keicher, Nassir Navab

TL;DR

This work tackles the challenge of differential diagnosis under uncertainty by introducing LA-CDM, a two-agent language-model framework that actively and iteratively gathers diagnostic information. It combines supervised learning for accurate hypothesis generation with reinforcement learning for calibrated uncertainty and cost-aware decision-making, enabling tests to be selected by information value. Evaluated on the MIMIC-CDM dataset across four abdominal diseases, LA-CDM achieves higher diagnostic accuracy with fewer tests than strong baselines, and demonstrates calibrated confidence and patient-adaptive testing strategies. The approach offers a pragmatic path to more efficient, personalized AI-assisted clinical decision-making with potential to reduce costs and patient burden while maintaining diagnostic performance.

Abstract

Clinical decision-making is a dynamic, interactive, and cyclic process where doctors have to repeatedly decide on which clinical action to perform and consider newly uncovered information for diagnosis and treatment. Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical decision support suffer from one of two limitations: Either they assume the unrealistic scenario of immediate availability of all patient information and do not model the interactive and iterative investigation process, or they restrict themselves to the limited "out-of-the-box" capabilities of large pre-trained models without performing task-specific training. In contrast to this, we propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM, that converges towards a diagnosis via repeatedly requesting and interpreting relevant tests. Using a hybrid training paradigm combining supervised and reinforcement learning, we train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making. We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases containing various clinical tests and show the benefit of explicitly training clinical decision-making for increasing diagnostic performance and efficiency.

Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning

TL;DR

This work tackles the challenge of differential diagnosis under uncertainty by introducing LA-CDM, a two-agent language-model framework that actively and iteratively gathers diagnostic information. It combines supervised learning for accurate hypothesis generation with reinforcement learning for calibrated uncertainty and cost-aware decision-making, enabling tests to be selected by information value. Evaluated on the MIMIC-CDM dataset across four abdominal diseases, LA-CDM achieves higher diagnostic accuracy with fewer tests than strong baselines, and demonstrates calibrated confidence and patient-adaptive testing strategies. The approach offers a pragmatic path to more efficient, personalized AI-assisted clinical decision-making with potential to reduce costs and patient burden while maintaining diagnostic performance.

Abstract

Clinical decision-making is a dynamic, interactive, and cyclic process where doctors have to repeatedly decide on which clinical action to perform and consider newly uncovered information for diagnosis and treatment. Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical decision support suffer from one of two limitations: Either they assume the unrealistic scenario of immediate availability of all patient information and do not model the interactive and iterative investigation process, or they restrict themselves to the limited "out-of-the-box" capabilities of large pre-trained models without performing task-specific training. In contrast to this, we propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM, that converges towards a diagnosis via repeatedly requesting and interpreting relevant tests. Using a hybrid training paradigm combining supervised and reinforcement learning, we train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making. We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases containing various clinical tests and show the benefit of explicitly training clinical decision-making for increasing diagnostic performance and efficiency.

Paper Structure

This paper contains 29 sections, 5 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: An illustrative process of clinical decision-making performed by LA-CDM. At the beginning, only the patient history including symptoms and family history is known. In a cyclic process, the hypothesis agent forms an uncertainty-aware hypothesis and the decision agent decides on a clinical action (request a test or diagnose). If a test is requested the results are added to the known patient information. The cycle repeats until a final diagnosis is given.
  • Figure 2: Overview of our method LA-CDM and its three training objectives. The hypothesis agent receives the current patient state and predicts a hypothesis and confidence. The hypothesis generation is trained supervised, the confidence calibration using reinforcement learning. The hypothesis agent output and the current patient state are then provided to the decision agent that is trained to decide on an optimal clinical action (test request or diagnosis) using reinforcement learning.
  • Figure 3: Left: Calibration curves before and after training LA-CDM. Right: Distribution of confidence estimations before and after training LA-CDM.
  • Figure 4: A successful diagnosis performed with the evaluation of ultrasound and CT. The model decides to confirm an initial suspicion with an ultrasound, however, the results are not conclusive, prompting another imaging test.
  • Figure 5: A short example diagnosis. In many cases, the model learns to correctly predict the condition very briefly with just a CT.
  • ...and 1 more figures