Table of Contents
Fetching ...

Toward expanding the scope of radiology report summarization to multiple anatomies and modalities

Zhihong Chen, Maya Varma, Xiang Wan, Curtis Langlotz, Jean-Benoit Delbrouck

TL;DR

This work tackles the limited scope and reproducibility of radiology report summarization by introducing the MIMIC-RRS dataset, a public resource spanning three modalities and seven anatomies (12 new modality-anatomy pairs) derived from MIMIC-III and MIMIC-CXR. It conducts extensive benchmarking of pretrained seq2seq backbones (with a focus on BART variants) on both in-domain and cross-domain settings and introduces RadGraph as a modality-agnostic factuality metric for clinical efficacy. The findings show that backbone choice and scale drive performance, with ALL-training providing strongest cross-modality generalization and BART-based backbones delivering the best clinical-efficacy scores. Overall, the paper provides a scalable, multimodal benchmark for RRS and a practical factuality metric, while acknowledging limitations and outlining future directions to broaden model coverage and evaluation.

Abstract

Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric.

Toward expanding the scope of radiology report summarization to multiple anatomies and modalities

TL;DR

This work tackles the limited scope and reproducibility of radiology report summarization by introducing the MIMIC-RRS dataset, a public resource spanning three modalities and seven anatomies (12 new modality-anatomy pairs) derived from MIMIC-III and MIMIC-CXR. It conducts extensive benchmarking of pretrained seq2seq backbones (with a focus on BART variants) on both in-domain and cross-domain settings and introduces RadGraph as a modality-agnostic factuality metric for clinical efficacy. The findings show that backbone choice and scale drive performance, with ALL-training providing strongest cross-modality generalization and BART-based backbones delivering the best clinical-efficacy scores. Overall, the paper provides a scalable, multimodal benchmark for RRS and a practical factuality metric, while acknowledging limitations and outlining future directions to broaden model coverage and evaluation.

Abstract

Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric.
Paper Structure (20 sections, 9 figures, 5 tables)

This paper contains 20 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Section length and vocabulary size for reports from each modality-anatomy pair.
  • Figure 2: Cross-modality-anatomy results from BART-L are visualized here using heatmaps. Colors from light to dark represent the values from low to high in each column. As discussed in Section \ref{['sec:Cross-anatomy-modality']}, the model variant "ALL" reports the strongest performances.
  • Figure 3: Example of the RadGraph annotations. Figure taken from 8ffe9a5.
  • Figure 4: Graph view of the RadGraph annotations for the report in Figure \ref{['fig:ddd']}.
  • Figure 5: Cross-modality-anatomy results from T5-S are visualized here using heatmpas. Colors from light to dark represent the values from low to high in each column.
  • ...and 4 more figures