Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG

Hasan Md Tusfiqur Alam; Devansh Srivastav; Md Abdul Kadir; Daniel Sonntag

Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG

Hasan Md Tusfiqur Alam, Devansh Srivastav, Md Abdul Kadir, Daniel Sonntag

TL;DR

This work tackles interpretability in chest X-ray analysis and radiology report generation by integrating concept bottleneck models (CBMs) with a multi-agent retrieval-augmented generation (RAG) system. It uses automatic concept discovery and ChexAgent/Mistral embeddings to build interpretable concept vectors that guide disease classification and subsequent report generation. A cohort of agents (ReAct, Radiologist, Medical Writer) collaboratively retrieves clinical documents and generates reports, with evaluation by an LLM judge that confirms interpretability and clinical usefulness; on COVID-QU, it achieves 81% accuracy and report metrics in the 84–90% range. The approach offers a path toward explainable, clinically actionable AI for radiology, with potential applicability to other modalities and ongoing robustness enhancements.

Abstract

Deep learning has advanced medical image classification, but interpretability challenges hinder its clinical adoption. This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports, enhancing clinical relevance, explainability, and transparency. Evaluation of the generated reports using an LLM-as-a-judge confirmed the interpretability and clinical utility of our model's outputs. On the COVID-QU dataset, our model achieved 81% classification accuracy and demonstrated robust report generation performance, with five key metrics ranging between 84% and 90%. This interpretable multi-agent framework bridges the gap between high-performance AI and the explainability required for reliable AI-driven CXR analysis in clinical settings. Our code is available at https://github.com/tifat58/IRR-with-CBM-RAG.git.

Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG

TL;DR

Abstract

Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)