Table of Contents
Fetching ...

A Systematic Review of Deep Learning-based Research on Radiology Report Generation

Chang Liu, Yuanhe Tian, Yan Song

TL;DR

This paper provides a comprehensive survey of deep learning–based radiology report generation (RRG), categorizing methods into visual-only, textual-only, and cross-modal approaches. It synthesizes prevalent datasets (e.g., CX-CHR, IU X-Ray, MIMIC-CXR) and evaluation metrics (NLG, CE, SIC, embedding-based, and TSF metrics), and analyzes performance across model architectures (notably Transformer-based encoders/decoders) and cross-modal strategies. The review identifies key findings, such as the superiority of cross-modal methods and the value of regional visual features and knowledge graphs, while highlighting ongoing challenges in data resources, model interpretability, and evaluation protocols. By mapping current progress and gaps, the paper aims to guide future research toward more accurate, clinically aligned, and generalizable RRG systems.

Abstract

Radiology report generation (RRG) aims to automatically generate free-text descriptions from clinical radiographs, e.g., chest X-Ray images. RRG plays an essential role in promoting clinical automation and presents significant help to provide practical assistance for inexperienced doctors and alleviate radiologists' workloads. Therefore, consider these meaningful potentials, research on RRG is experiencing explosive growth in the past half-decade, especially with the rapid development of deep learning approaches. Existing studies perform RRG from the perspective of enhancing different modalities, provide insights on optimizing the report generation process with elaborated features from both visual and textual information, and further facilitate RRG with the cross-modal interactions among them. In this paper, we present a comprehensive review of deep learning-based RRG from various perspectives. Specifically, we firstly cover pivotal RRG approaches based on the task-specific features of radiographs, reports, and the cross-modal relations between them, and then illustrate the benchmark datasets conventionally used for this task with evaluation metrics, subsequently analyze the performance of different approaches and finally offer our summary on the challenges and the trends in future directions. Overall, the goal of this paper is to serve as a tool for understanding existing literature and inspiring potential valuable research in the field of RRG.

A Systematic Review of Deep Learning-based Research on Radiology Report Generation

TL;DR

This paper provides a comprehensive survey of deep learning–based radiology report generation (RRG), categorizing methods into visual-only, textual-only, and cross-modal approaches. It synthesizes prevalent datasets (e.g., CX-CHR, IU X-Ray, MIMIC-CXR) and evaluation metrics (NLG, CE, SIC, embedding-based, and TSF metrics), and analyzes performance across model architectures (notably Transformer-based encoders/decoders) and cross-modal strategies. The review identifies key findings, such as the superiority of cross-modal methods and the value of regional visual features and knowledge graphs, while highlighting ongoing challenges in data resources, model interpretability, and evaluation protocols. By mapping current progress and gaps, the paper aims to guide future research toward more accurate, clinically aligned, and generalizable RRG systems.

Abstract

Radiology report generation (RRG) aims to automatically generate free-text descriptions from clinical radiographs, e.g., chest X-Ray images. RRG plays an essential role in promoting clinical automation and presents significant help to provide practical assistance for inexperienced doctors and alleviate radiologists' workloads. Therefore, consider these meaningful potentials, research on RRG is experiencing explosive growth in the past half-decade, especially with the rapid development of deep learning approaches. Existing studies perform RRG from the perspective of enhancing different modalities, provide insights on optimizing the report generation process with elaborated features from both visual and textual information, and further facilitate RRG with the cross-modal interactions among them. In this paper, we present a comprehensive review of deep learning-based RRG from various perspectives. Specifically, we firstly cover pivotal RRG approaches based on the task-specific features of radiographs, reports, and the cross-modal relations between them, and then illustrate the benchmark datasets conventionally used for this task with evaluation metrics, subsequently analyze the performance of different approaches and finally offer our summary on the challenges and the trends in future directions. Overall, the goal of this paper is to serve as a tool for understanding existing literature and inspiring potential valuable research in the field of RRG.
Paper Structure (33 sections, 13 equations, 6 figures, 7 tables)

This paper contains 33 sections, 13 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: A representative chest radiology image with its corresponding doctor-written radiology report. Specifically, each report consists of a "Findings" and a "Impression" section, where the "Findings" section records detailed descriptions by radiologists, and the "Impression" section presents an overall diagnostic summarization based on the radiograph. RRG is normally aiming at generating the "Findings" content.
  • Figure 2: The architectures of three main categories of visual-only approaches that utilize different types of visual features for the report generation process, including (a) global visual features, (b) regional visual features, and (c) global-regional aggregated features.
  • Figure 3: The architecture of autoregressive models for textual-only approaches, including (a) long-short term memory (LSTM) and (b) Transformer.
  • Figure 4: The architectures of three main categories of cross-modal approaches that enhance the cross-modal alignment for RRG from different perspectives, including (a) objective optimization, (b) representation weighting, and (c) architecture enhancement.
  • Figure 5: Examples of radiographs and their corresponding reports in IU X-Ray iu-xray, MIMIC-CXR mimic-cxr, MIMIC-ABN mimic-abn, MIMIC-CXR-JPG mimic-cxr-jpg, CX-CHR nips-2018-hrgr-agent, COV-CTR cov-ctr, and PEIR Gross jing-etal-2018-automatic, where the radiology reports of CX-CHR and COV-CTR are translated from Chinese into English. The red boxes in radiographs from CX-CHR and COV-CTR are annotated by physicians to highlight the attentive regions that the reports describe.
  • ...and 1 more figures