Table of Contents
Fetching ...

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang Miao

TL;DR

This work tackles chest X-ray report generation (CXRG) by explicitly incorporating patient-specific indications and strengthening cross-modal alignment between images and textual findings. It introduces Structural Entities Extraction and patient Indications Incorporation (SEI), comprising Structural Entities Extraction (SEE) to derive factual entity sequences and a cross-modal fusion network to integrate X-ray imagery, similar historical cases, and patient indications. The approach first pre-trains a cross-modal alignment module using factual sequences, then performs gradient-free retrieval of similar historical cases and fuses them with indications to generate reports, optimized via a negative log-likelihood objective $L_{LM}$. On the MIMIC-CXR dataset, SEI achieves state-of-the-art results across NLG and clinical-efficacy metrics, with ablations confirming the individual and combined value of SEE, similar historical cases, and indications for clinical fidelity and linguistic fluency.

Abstract

The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

TL;DR

This work tackles chest X-ray report generation (CXRG) by explicitly incorporating patient-specific indications and strengthening cross-modal alignment between images and textual findings. It introduces Structural Entities Extraction and patient Indications Incorporation (SEI), comprising Structural Entities Extraction (SEE) to derive factual entity sequences and a cross-modal fusion network to integrate X-ray imagery, similar historical cases, and patient indications. The approach first pre-trains a cross-modal alignment module using factual sequences, then performs gradient-free retrieval of similar historical cases and fuses them with indications to generate reports, optimized via a negative log-likelihood objective . On the MIMIC-CXR dataset, SEI achieves state-of-the-art results across NLG and clinical-efficacy metrics, with ablations confirming the individual and combined value of SEE, similar historical cases, and indications for clinical fidelity and linguistic fluency.

Abstract

The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.
Paper Structure (10 sections, 1 equation, 2 figures, 2 tables)

This paper contains 10 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Illustration of our SEI and cross-modal fusion network. (a) Overview of SEI, featuring dual encoders for extracting uni-modal features and a text decoder for report generation using X-ray images, similar historical cases (SHC), and patient-specific indications. The training paradigm of SEI includes 1) pre-training via the cross-modal alignment module; 2) gradient-free retrieval of similar historical cases using the pre-trained model from step 1), shown in the light grey box; 3) fine-tuning using the report generation module. (b) Details and output rules of the cross-modal fusion network.
  • Figure 2: An example of generated reports and attention visualization on MIMIC-CXR test set. Distinct colors in the reference report indicate the factual entity subsequence within different sentences. Generated reports and similar historical cases are highlighted in matching colors. “Baseline” represents the CGPT2 nicolson-improving method.