Table of Contents
Fetching ...

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

Hyungyung Lee, Hangyul Yoon, Edward Choi

TL;DR

CXReasonAgent is presented, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools to perform evidence-grounded diagnostic reasoning using image-derived diagnostic and visual evidence, and produces faithfully grounded responses, enabling more reliable and verifiable diagnostic reasoning than LVLMs.

Abstract

Chest X-ray plays a central role in thoracic diagnosis, and its interpretation inherently requires multi-step, evidence-grounded reasoning. However, large vision-language models (LVLMs) often generate plausible responses that are not faithfully grounded in diagnostic evidence and provide limited visual evidence for verification, while also requiring costly retraining to support new diagnostic tasks, limiting their reliability and adaptability in clinical settings. To address these limitations, we present CXReasonAgent, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools to perform evidence-grounded diagnostic reasoning using image-derived diagnostic and visual evidence. To evaluate these capabilities, we introduce CXReasonDial, a multi-turn dialogue benchmark with 1,946 dialogues across 12 diagnostic tasks, and show that CXReasonAgent produces faithfully grounded responses, enabling more reliable and verifiable diagnostic reasoning than LVLMs. These findings highlight the importance of integrating clinically grounded diagnostic tools, particularly in safety-critical clinical settings.

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

TL;DR

CXReasonAgent is presented, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools to perform evidence-grounded diagnostic reasoning using image-derived diagnostic and visual evidence, and produces faithfully grounded responses, enabling more reliable and verifiable diagnostic reasoning than LVLMs.

Abstract

Chest X-ray plays a central role in thoracic diagnosis, and its interpretation inherently requires multi-step, evidence-grounded reasoning. However, large vision-language models (LVLMs) often generate plausible responses that are not faithfully grounded in diagnostic evidence and provide limited visual evidence for verification, while also requiring costly retraining to support new diagnostic tasks, limiting their reliability and adaptability in clinical settings. To address these limitations, we present CXReasonAgent, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools to perform evidence-grounded diagnostic reasoning using image-derived diagnostic and visual evidence. To evaluate these capabilities, we introduce CXReasonDial, a multi-turn dialogue benchmark with 1,946 dialogues across 12 diagnostic tasks, and show that CXReasonAgent produces faithfully grounded responses, enabling more reliable and verifiable diagnostic reasoning than LVLMs. These findings highlight the importance of integrating clinically grounded diagnostic tools, particularly in safety-critical clinical settings.
Paper Structure (5 sections, 2 figures, 4 tables)

This paper contains 5 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Qualitative examples on CXReasonDial. CXReasonAgent produces responses faithfully grounded in image-derived diagnostic and visual evidence, whereas LVLMs often generate ungrounded responses and fail to provide visual evidence.
  • Figure 2: Overview of CXReasonAgent. The pipeline comprises three stages: (1) interpreting the user query and planning the appropriate diagnostic tool call, (2) constructing diagnostic and visual evidence from the chest X-ray using clinically grounded diagnostic tools, and (3) generating responses solely grounded in this evidence.