Topicwise Separable Sentence Retrieval for Medical Report Generation

Junting Zhao; Yang Zhou; Zhihao Chen; Huazhu Fu; Liang Wan

Topicwise Separable Sentence Retrieval for Medical Report Generation

Junting Zhao, Yang Zhou, Zhihao Chen, Huazhu Fu, Liang Wan

TL;DR

This paper tackles the long-tail challenge in retrieval-based medical report generation by introducing Teaser, which separates common and rare topics via a Topicwise Separable Encoder and aligns them to sentences through a Topic Alignment Loss that includes a Topic Contrastive Loss. An Abstractor compresses high-dimensional visual features to reduce noise and facilitate cross-modal matching. The approach yields state-of-the-art results on MIMIC-CXR and IU-Xray across clinical-efficacy and natural language generation metrics, and ablations confirm the efficacy of the Abstractor, TCL, and the two-topic query design in improving rare-topic coverage and avoiding topic confusion. The work reports meaningful practical impact by producing more accurate and comprehensive radiology reports, particularly in describing rare but clinically critical findings. Overall, Teaser advances retrieval-based medical report generation by explicitly modeling topic-level distinctions and aligning image-derived queries with sentence galleries, enabling more dependable and fine-grained descriptions in clinical practice.

Abstract

Automated radiology reporting holds immense clinical potential in alleviating the burdensome workload of radiologists and mitigating diagnostic bias. Recently, retrieval-based report generation methods have garnered increasing attention due to their inherent advantages in terms of the quality and consistency of generated reports. However, due to the long-tail distribution of the training data, these models tend to learn frequently occurring sentences and topics, overlooking the rare topics. Regrettably, in many cases, the descriptions of rare topics often indicate critical findings that should be mentioned in the report. To address this problem, we introduce a Topicwise Separable Sentence Retrieval (Teaser) for medical report generation. To ensure comprehensive learning of both common and rare topics, we categorize queries into common and rare types to learn differentiated topics, and then propose Topic Contrastive Loss to effectively align topics and queries in the latent space. Moreover, we integrate an Abstractor module following the extraction of visual features, which aids the topic decoder in gaining a deeper understanding of the visual observational intent. Experiments on the MIMIC-CXR and IU X-ray datasets demonstrate that Teaser surpasses state-of-the-art models, while also validating its capability to effectively represent rare topics and establish more dependable correspondences between queries and topics.

Topicwise Separable Sentence Retrieval for Medical Report Generation

TL;DR

Abstract

Paper Structure (22 sections, 12 equations, 8 figures, 5 tables)

This paper contains 22 sections, 12 equations, 8 figures, 5 tables.

Introduction
Related Works
Generation-based Medical Report Generation
Retrieval-based Medical Report Generation
Proposed Method
Visual Feature Extraction by Abstractor
Topicwise Separable Sentence Retrieval
Topic Alignment Loss
Experiments
Datasets
MIMIC-CXR
IU-Xray
Evaluation Metrics
Implementation details
Quantitative Comparison with SOTA Methods
...and 7 more sections

Figures (8)

Figure 1: Illustration of the common mistakes made by existing retrieval-based methods. The histogram represents the sentence frequencies in the training set. Existing methods tend to retrieve high-frequency sentences while neglecting low-frequency ones.
Figure 2: The framework of the proposed Teaser includes the visual encoder and Abstractor, responsible for extracting and abstracting visual features, and the Topicwise Separable Encoder for obtaining topic embeddings. In the training stage, Hungarian Matcher is utilized to select the best matching topic embedding for each ground truth sentence. The similarity loss ($\mathcal{L}_{\mathrm{sim}}$) and topic contrastive loss ($\mathcal{L}_{\mathrm{TCL}}$) are used to align these embeddings, while the selection($\mathcal{L}_{\mathrm{select}}$) loss aids in topic filtering. In the testing stage, the chosen topic embeddings retrieve the best matches separately from the common and rare sentence galleries. These matches are then merged together to generate the final report.
Figure 3: Attention maps with and without employing Abstractor.
Figure 4: t-SNE of topic embeddings with and without TCL. The topic embeddings generated by the identical query are represented using the identical color.
Figure 5: Visualization and reports comparison between our Teaser and SOTA Methods. Sentences shaded in the same color represent corresponding to the same topics. The underlined texts indicate the rare sentences.
...and 3 more figures

Topicwise Separable Sentence Retrieval for Medical Report Generation

TL;DR

Abstract

Topicwise Separable Sentence Retrieval for Medical Report Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)