Table of Contents
Fetching ...

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

TL;DR

Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics.

Abstract

A radiology report comprises presentation-style vocabulary, which ensures clarity and organization, and factual vocabulary, which provides accurate and objective descriptions based on observable findings. While manually writing these reports is time-consuming and labor-intensive, automatic report generation offers a promising alternative. A critical step in this process is to align radiographs with their corresponding reports. However, existing methods often rely on complete reports for alignment, overlooking the impact of presentation-style vocabulary. To address this issue, we propose FSE, a two-stage Factual Serialization Enhancement method. In Stage 1, we introduce factuality-guided contrastive learning for visual representation by maximizing the semantic correspondence between radiographs and corresponding factual descriptions. In Stage 2, we present evidence-driven report generation that enhances diagnostic accuracy by integrating insights from similar historical cases structured as factual serialization. Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics. Ablation studies further emphasize the positive effects of factual serialization in Stage 1 and Stage 2. The code is available at https://github.com/mk-runner/FSE.

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

TL;DR

Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics.

Abstract

A radiology report comprises presentation-style vocabulary, which ensures clarity and organization, and factual vocabulary, which provides accurate and objective descriptions based on observable findings. While manually writing these reports is time-consuming and labor-intensive, automatic report generation offers a promising alternative. A critical step in this process is to align radiographs with their corresponding reports. However, existing methods often rely on complete reports for alignment, overlooking the impact of presentation-style vocabulary. To address this issue, we propose FSE, a two-stage Factual Serialization Enhancement method. In Stage 1, we introduce factuality-guided contrastive learning for visual representation by maximizing the semantic correspondence between radiographs and corresponding factual descriptions. In Stage 2, we present evidence-driven report generation that enhances diagnostic accuracy by integrating insights from similar historical cases structured as factual serialization. Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics. Ablation studies further emphasize the positive effects of factual serialization in Stage 1 and Stage 2. The code is available at https://github.com/mk-runner/FSE.
Paper Structure (12 sections, 10 equations, 4 figures, 4 tables)

This paper contains 12 sections, 10 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of our proposed FSE for chest X-ray report generation. FSE involves factuality-guided contrastive learning (A) for visual representation, followed by gradient-free retrieval of similar historical cases (B), and concludes with evidence-driven report generation. The inference phase exclusively employs evidence-driven report generation.
  • Figure 2: Two examples of factual serialization generated by our structural entities approach. The upper panel shows entities and their relationships identified by RadGraph jain-radgraph, while the lower panel shows factual serialization from reports. “O-DP”, "A-DP", and "O-U" indicate entity types. “modify”, “located_at”, and “suggestive_of” denote relationships between them.
  • Figure 3: Visualization of similar historical cases for two samples from the MIMIC-CXR test set. In the left panel, factual sequences in the target X-ray images are shown in different colors. The right panel highlights corresponding factual sequences in similar historical cases using colors that match those in the target X-ray images.
  • Figure 4: Two examples of generated reports and attention visualization in the MIMIC-CXR test set. In the reference report, different colors are used to highlight factual sequences, with matching colors applied to corresponding factual vocabulary in the generated report. Performance metrics, represented as "A/B", indicate the values achieved by FSE-5 and CvT2DistillGPT2 nicolson-improving, respectively.