Table of Contents
Fetching ...

Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG

Hasan Md Tusfiqur Alam, Devansh Srivastav, Md Abdul Kadir, Daniel Sonntag

TL;DR

This work tackles interpretability in chest X-ray analysis and radiology report generation by integrating concept bottleneck models (CBMs) with a multi-agent retrieval-augmented generation (RAG) system. It uses automatic concept discovery and ChexAgent/Mistral embeddings to build interpretable concept vectors that guide disease classification and subsequent report generation. A cohort of agents (ReAct, Radiologist, Medical Writer) collaboratively retrieves clinical documents and generates reports, with evaluation by an LLM judge that confirms interpretability and clinical usefulness; on COVID-QU, it achieves 81% accuracy and report metrics in the 84–90% range. The approach offers a path toward explainable, clinically actionable AI for radiology, with potential applicability to other modalities and ongoing robustness enhancements.

Abstract

Deep learning has advanced medical image classification, but interpretability challenges hinder its clinical adoption. This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports, enhancing clinical relevance, explainability, and transparency. Evaluation of the generated reports using an LLM-as-a-judge confirmed the interpretability and clinical utility of our model's outputs. On the COVID-QU dataset, our model achieved 81% classification accuracy and demonstrated robust report generation performance, with five key metrics ranging between 84% and 90%. This interpretable multi-agent framework bridges the gap between high-performance AI and the explainability required for reliable AI-driven CXR analysis in clinical settings. Our code is available at https://github.com/tifat58/IRR-with-CBM-RAG.git.

Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG

TL;DR

This work tackles interpretability in chest X-ray analysis and radiology report generation by integrating concept bottleneck models (CBMs) with a multi-agent retrieval-augmented generation (RAG) system. It uses automatic concept discovery and ChexAgent/Mistral embeddings to build interpretable concept vectors that guide disease classification and subsequent report generation. A cohort of agents (ReAct, Radiologist, Medical Writer) collaboratively retrieves clinical documents and generates reports, with evaluation by an LLM judge that confirms interpretability and clinical usefulness; on COVID-QU, it achieves 81% accuracy and report metrics in the 84–90% range. The approach offers a path toward explainable, clinically actionable AI for radiology, with potential applicability to other modalities and ongoing robustness enhancements.

Abstract

Deep learning has advanced medical image classification, but interpretability challenges hinder its clinical adoption. This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports, enhancing clinical relevance, explainability, and transparency. Evaluation of the generated reports using an LLM-as-a-judge confirmed the interpretability and clinical utility of our model's outputs. On the COVID-QU dataset, our model achieved 81% classification accuracy and demonstrated robust report generation performance, with five key metrics ranging between 84% and 90%. This interpretable multi-agent framework bridges the gap between high-performance AI and the explainability required for reliable AI-driven CXR analysis in clinical settings. Our code is available at https://github.com/tifat58/IRR-with-CBM-RAG.git.

Paper Structure

This paper contains 6 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Input (blue) to output (green) pipeline: Given a CXR as input, the Concept Bottleneck Model predicts clinical attributes (concepts) and their contributions in an intermediate step, followed by predicting the disease class. The multi-agent RAG system then generates a comprehensive report, incorporating clinical interpretations and insights drawn from relevant clinical documents
  • Figure 2: Proposed architecture for the interpretable report generation. (Top) For a CXR image, disease class, and concept contribution scores are predicted using a CBM model with automatic concept discovery. (Bottom) Based on these contributions, a multi-agent RAG system generates reports using relevant clinical documents. The chain-of-thought reasoning ensures that detected features contribute to accurate classification and report generation, with the final output evaluated for robustness and clinical relevance by LLM as judge zheng2023judging evaluation.
  • Figure 3: Evaluation of the Robustness of Concept set of the classification model.
  • Figure 4: t-SNE visualization of the embeddings of generated reports