RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

Kai Zhang; Corey D Barrett; Jangwon Kim; Lichao Sun; Tara Taghavi; Krishnaram Kenthapadi

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

Kai Zhang, Corey D Barrett, Jangwon Kim, Lichao Sun, Tara Taghavi, Krishnaram Kenthapadi

TL;DR

RadAgents presents a radiologist-inspired, multi-agent framework for chest X-ray interpretation that couples specialized subagents with an orchestrator and synthesizer to achieve auditable, multimodal reasoning. It integrates a diverse toolset, grounding, and retrieval-augmented conflict resolution (V-RAG) to produce transparent, clinically aligned outputs. Evaluations across ChestAgentBench, CheXbench, and MIMIC-CXR show consistent performance gains over baselines, with ablations confirming the value of radiologist-like workflows and visual retrieval. The work suggests that on-premise, open-model architectures can reach competitive performance with proper workflow encoding and cross-tool verification, potentially improving trust and safety in clinical AI systems.

Abstract

Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework that couples clinical priors with task-aware multimodal reasoning and encodes a radiologist-style workflow into a modular, auditable pipeline. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

TL;DR

Abstract

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)