Table of Contents
Fetching ...

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

Kai Zhang, Corey D Barrett, Jangwon Kim, Lichao Sun, Tara Taghavi, Krishnaram Kenthapadi

TL;DR

RadAgents presents a radiologist-inspired, multi-agent framework for chest X-ray interpretation that couples specialized subagents with an orchestrator and synthesizer to achieve auditable, multimodal reasoning. It integrates a diverse toolset, grounding, and retrieval-augmented conflict resolution (V-RAG) to produce transparent, clinically aligned outputs. Evaluations across ChestAgentBench, CheXbench, and MIMIC-CXR show consistent performance gains over baselines, with ablations confirming the value of radiologist-like workflows and visual retrieval. The work suggests that on-premise, open-model architectures can reach competitive performance with proper workflow encoding and cross-tool verification, potentially improving trust and safety in clinical AI systems.

Abstract

Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework that couples clinical priors with task-aware multimodal reasoning and encodes a radiologist-style workflow into a modular, auditable pipeline. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

TL;DR

RadAgents presents a radiologist-inspired, multi-agent framework for chest X-ray interpretation that couples specialized subagents with an orchestrator and synthesizer to achieve auditable, multimodal reasoning. It integrates a diverse toolset, grounding, and retrieval-augmented conflict resolution (V-RAG) to produce transparent, clinically aligned outputs. Evaluations across ChestAgentBench, CheXbench, and MIMIC-CXR show consistent performance gains over baselines, with ablations confirming the value of radiologist-like workflows and visual retrieval. The work suggests that on-premise, open-model architectures can reach competitive performance with proper workflow encoding and cross-tool verification, potentially improving trust and safety in clinical AI systems.

Abstract

Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework that couples clinical priors with task-aware multimodal reasoning and encodes a radiologist-style workflow into a modular, auditable pipeline. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.

Paper Structure

This paper contains 23 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Different queries should trigger different reasoning modes. Simply cropping regions of interest and curating visual chain-of-thought reasoning is not a panacea.
  • Figure 2: RadAgents framework. Each ABCDE subagent executes in parallel guided by clinical workflows, lowering latency, preserving isolation to avoid long-context drift, and improving trustworthiness.
  • Figure 3: Resolving the conflicts via V-RAG chu2025reducing.
  • Figure 4: Performance on ChestAgentBench across different categories of questions.
  • Figure 5: Multi-view and longitudinal performance on MIMIC-CXR test set.
  • ...and 2 more figures