Automatic Medical Report Generation: Methods and Applications

Li Guo; Anas M. Tahir; Dong Zhang; Z. Jane Wang; Rabab K. Ward

Automatic Medical Report Generation: Methods and Applications

Li Guo, Anas M. Tahir, Dong Zhang, Z. Jane Wang, Rabab K. Ward

TL;DR

A comprehensive understanding of automatic medical report generation methods from 2021 to 2024 is provided to provide a comprehensive understanding of the existing literature and inspire valuable future research.

Abstract

The increasing demand for medical imaging has surpassed the capacity of available radiologists, leading to diagnostic delays and potential misdiagnoses. Artificial intelligence (AI) techniques, particularly in automatic medical report generation (AMRG), offer a promising solution to this dilemma. This review comprehensively examines AMRG methods from 2021 to 2024. It (i) presents solutions to primary challenges in this field, (ii) explores AMRG applications across various imaging modalities, (iii) introduces publicly available datasets, (iv) outlines evaluation metrics, (v) identifies techniques that significantly enhance model performance, and (vi) discusses unresolved issues and potential future research directions. This paper aims to provide a comprehensive understanding of the existing literature and inspire valuable future research.

Automatic Medical Report Generation: Methods and Applications

TL;DR

Abstract

Paper Structure (33 sections, 29 equations, 9 figures, 1 table)

This paper contains 33 sections, 29 equations, 9 figures, 1 table.

Introduction
Problem Statement
Methods
Bridging the Gap Between Modalities
Global Alignment
Local Alignment
Intermediate Matrix
Lesion-Focused Image Encoding
Disease Classification
Detection and Segmentation
Internal Structure of Image Encoder
Enhancing Text Decoder With Supplementary Information
Retrieve Similarity Reports
Memory
Knowledge Graph
...and 18 more sections

Figures (9)

Figure 1: The content road map of this review paper. First, we present five types of solutions to address the challenges of AMRG. Next, we explore the applications of AMRG across different imaging modalities. Following this, we introduce various public datasets. Then, we outline the evaluation metrics employed to assess model performance. By comparing the performance of models on benchmark datasets, we identify six techniques that effectively enhance model performance. Finally, we discuss future research directions in the field.
Figure 2: Four challenges in automatic medical report generation (AMRG) and their corresponding solutions.
Figure 3: Flowcharts of three representative alignment methods. The left diagram illustrates global alignment, which typically uses the [CLS] token to represent the global representation of a modality. The middle diagram depicts local alignment, aligning image patches with word tokens. The right diagram shows alignment via an intermediate matrix, where a shared matrix represents the features of both modalities, ensuring they are in the same latent space.
Figure 4: Flowcharts of three representative methods for enhancing image encoding: The left diagram shows that the image features extracted by the image encoder (IE) are used for disease classification, typically using only the [CLS] token instead of all image tokens. The middle diagram illustrates that the image is first processed through a pre-trained segmentation network (Seg Net) to segment meaningful areas (such as the left and right lungs), and only these areas are then input into the image encoder to eliminate background interference. The right diagram demonstrates that with cross-attention, the image features are used as keys and values, while the disease tags are used as queries. This method encourages the model to focus on image areas related to disease tags. TE represents text encoder.
Figure 5: Flowcharts of three representative methods for augmenting the text decoder with supplementary information. The left diagram shows a retrieval-based approach. Reports similar to the input image are found from the corpus (consisting of training reports) based on cosine similarity and are input into the text decoder as reference information. The middle diagram illustrates replacing the corpus with a learnable memory to avoid leakage of training data. The right diagram demonstrates replacing the memory with a knowledge graph to store the clinical information used to generate the report in a structured manner. IE and TE represent image encoder and text encoder, respectively.
...and 4 more figures

Automatic Medical Report Generation: Methods and Applications

TL;DR

Abstract

Automatic Medical Report Generation: Methods and Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (9)