Table of Contents
Fetching ...

Automated Radiology Report Generation: A Review of Recent Advances

Phillip Sloan, Philip Clatworthy, Edwin Simpson, Majid Mirmehdi

TL;DR

This survey comprehensively maps recent ARRG progress (2020–2023) across datasets, training paradigms, architectures, knowledge integration, and evaluation. It finds that large-scale multimodal approaches, temporal context, and knowledge graphs increasingly enhance report fidelity, while evaluation remains challenging due to the gap between NLP metrics and clinical relevance. The authors advocate for standardized splits, richer clinical evaluation, and embracing pretrained LLMs and diverse datasets to advance generalization and trust in ARRG systems. They also highlight promising directions in RL with human feedback, multimodal data fusion, and broader modality coverage beyond chest radiographs.

Abstract

Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.

Automated Radiology Report Generation: A Review of Recent Advances

TL;DR

This survey comprehensively maps recent ARRG progress (2020–2023) across datasets, training paradigms, architectures, knowledge integration, and evaluation. It finds that large-scale multimodal approaches, temporal context, and knowledge graphs increasingly enhance report fidelity, while evaluation remains challenging due to the gap between NLP metrics and clinical relevance. The authors advocate for standardized splits, richer clinical evaluation, and embracing pretrained LLMs and diverse datasets to advance generalization and trust in ARRG systems. They also highlight promising directions in RL with human feedback, multimodal data fusion, and broader modality coverage beyond chest radiographs.

Abstract

Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.
Paper Structure (30 sections, 8 figures, 6 tables)

This paper contains 30 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Illustration of the rapid growth of published papers within the ARRG domain. The dashed line represents a forecast of anticipated future publications.
  • Figure 2: The road map of this paper -- First relevant ARRG datasets are discussed, before we visit recent model training approaches. Next, the range and extent of deep learning architectures deployed or developed are discussed, followed by the "who and how" of works that have used knowledge and multiple modalities. Finally, we consider assessment and evaluation methods employed in, or specifically crafted for, ARRG.
  • Figure 3: An example x-ray and report pair from Indiana University X-ray dataset IU-Xray. ARRG models seek to generate the Findings and Impression sections.
  • Figure 4: An illustration of (A) contrastive learning and (B) contrastive language-image pre-training (CLIP) methods used within ARRG. (A) demonstrates SimCLR SimCLR as implemented for example by Hou2023bTanwani2022. In this framework, image augmentations such as cropping and Gaussian noise were used to create additional positive samples while other radiographic images were considered as negative samples. (B) demonstrates CLIP, which was implemented by works such as Endo2021Leonardi2022. The dotted lines present novel pathways from ARRG research contributions: "hard" negatives were used by Yan2021Jeong2023 to enable more robust representations to be learnt and WuX2023 developed extra positive samples at the embedding level through the use of dropout and cutoff.
  • Figure 5: An overview of the three main encoder-decoder architectures used within the ARRG domain: (A) demonstrates a typical CNN-LSTM network as utilised for example by ZhangY2020Liu2021bWangZ2021WangS2022Nishino2022Gajbhiye2022WangF2022Kaur2023, (B) illustrates the most popular architecture in use today, such as in Chen2020Hou2023Wu2023Huang2023 while (C) embodies the pure transformer architecture used for example by Wang2022Shang2022Mohsan2023.
  • ...and 3 more figures