Table of Contents
Fetching ...

Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation

Liwen Sun, James Zhao, Megan Han, Chenyan Xiong

TL;DR

This work addresses factual inaccuracies in radiology report generation by introducing FactMM-RAG, a fact-aware multimodal retrieval-augmented framework. It mines fact-grounded report pairs using RadGraph, trains a universal multimodal retriever to fetch high-quality references, and integrates these references into a multimodal foundation model for generation. The approach yields significant improvements in clinically relevant metrics (up to 6.5% F1CheXbert and 2% F1RadGraph on MIMIC-CXR and CheXpert) and demonstrates that fact-aware supervision can be achieved without explicit diagnostic labels, with the fact-aware signals propagating from retrieval to generation. The method has practical implications for more reliable radiology reporting and can be extended to other medical imaging domains, subject to careful evaluation and ethical considerations.

Abstract

Multimodal foundation models hold significant potential for automating radiology report generation, thereby assisting clinicians in diagnosing cardiac diseases. However, generated reports often suffer from serious factual inaccuracy. In this paper, we introduce a fact-aware multimodal retrieval-augmented pipeline in generating accurate radiology reports (FactMM-RAG). We first leverage RadGraph to mine factual report pairs, then integrate factual knowledge to train a universal multimodal retriever. Given a radiology image, our retriever can identify high-quality reference reports to augment multimodal foundation models, thus enhancing the factual completeness and correctness of report generation. Experiments on two benchmark datasets show that our multimodal retriever outperforms state-of-the-art retrievers on both language generation and radiology-specific metrics, up to 6.5% and 2% score in F1CheXbert and F1RadGraph. Further analysis indicates that employing our factually-informed training strategy imposes an effective supervision signal, without relying on explicit diagnostic label guidance, and successfully propagates fact-aware capabilities from the multimodal retriever to the multimodal foundation model in radiology report generation.

Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation

TL;DR

This work addresses factual inaccuracies in radiology report generation by introducing FactMM-RAG, a fact-aware multimodal retrieval-augmented framework. It mines fact-grounded report pairs using RadGraph, trains a universal multimodal retriever to fetch high-quality references, and integrates these references into a multimodal foundation model for generation. The approach yields significant improvements in clinically relevant metrics (up to 6.5% F1CheXbert and 2% F1RadGraph on MIMIC-CXR and CheXpert) and demonstrates that fact-aware supervision can be achieved without explicit diagnostic labels, with the fact-aware signals propagating from retrieval to generation. The method has practical implications for more reliable radiology reporting and can be extended to other medical imaging domains, subject to careful evaluation and ethical considerations.

Abstract

Multimodal foundation models hold significant potential for automating radiology report generation, thereby assisting clinicians in diagnosing cardiac diseases. However, generated reports often suffer from serious factual inaccuracy. In this paper, we introduce a fact-aware multimodal retrieval-augmented pipeline in generating accurate radiology reports (FactMM-RAG). We first leverage RadGraph to mine factual report pairs, then integrate factual knowledge to train a universal multimodal retriever. Given a radiology image, our retriever can identify high-quality reference reports to augment multimodal foundation models, thus enhancing the factual completeness and correctness of report generation. Experiments on two benchmark datasets show that our multimodal retriever outperforms state-of-the-art retrievers on both language generation and radiology-specific metrics, up to 6.5% and 2% score in F1CheXbert and F1RadGraph. Further analysis indicates that employing our factually-informed training strategy imposes an effective supervision signal, without relying on explicit diagnostic label guidance, and successfully propagates fact-aware capabilities from the multimodal retriever to the multimodal foundation model in radiology report generation.
Paper Structure (20 sections, 6 equations, 5 figures, 4 tables)

This paper contains 20 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An overview of the FactMM-RAG system. It mainly contains three stages: (1) Leveraging RadGraph to characterize each radiology report and mine factually-informed report pairs; (2) Integrating factual knowledge into the training of the universal multimodal retriever; (3) Given the radiology image, employing the fact-aware multimodal retriever to search for factually-informed reference reports and augmenting the multimodal foundation model in generating accurate radiology reports.
  • Figure 2: Factual performance of FactMM-RAG controlled by different F1CheXbert and F1RadGraph thresholds. We vary the F1RadGraph thresholds under one fixed F1CheXbert threshold selected from {0, 0.4, 0.8, 1}.
  • Figure 3: Retrieval evaluation of FactMM-RAG with different F1CheXbert and F1RadGraph thresholds. MRR calculates the mean reciprocal of rank at which the first relevant report that meets two factual similarity thresholds with query report is retrieved.
  • Figure 4: Analysis of fact-aware capability propagation. The $x$-axis MRR measures the retriever's performance on retrieving factually relevant reports.
  • Figure 5: Prompt templates for Visual Question Answering and Retrieval Augmented Generation