Toward expanding the scope of radiology report summarization to multiple anatomies and modalities
Zhihong Chen, Maya Varma, Xiang Wan, Curtis Langlotz, Jean-Benoit Delbrouck
TL;DR
This work tackles the limited scope and reproducibility of radiology report summarization by introducing the MIMIC-RRS dataset, a public resource spanning three modalities and seven anatomies (12 new modality-anatomy pairs) derived from MIMIC-III and MIMIC-CXR. It conducts extensive benchmarking of pretrained seq2seq backbones (with a focus on BART variants) on both in-domain and cross-domain settings and introduces RadGraph as a modality-agnostic factuality metric for clinical efficacy. The findings show that backbone choice and scale drive performance, with ALL-training providing strongest cross-modality generalization and BART-based backbones delivering the best clinical-efficacy scores. Overall, the paper provides a scalable, multimodal benchmark for RRS and a practical factuality metric, while acknowledging limitations and outlining future directions to broaden model coverage and evaluation.
Abstract
Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric.
