Topicwise Separable Sentence Retrieval for Medical Report Generation
Junting Zhao, Yang Zhou, Zhihao Chen, Huazhu Fu, Liang Wan
TL;DR
This paper tackles the long-tail challenge in retrieval-based medical report generation by introducing Teaser, which separates common and rare topics via a Topicwise Separable Encoder and aligns them to sentences through a Topic Alignment Loss that includes a Topic Contrastive Loss. An Abstractor compresses high-dimensional visual features to reduce noise and facilitate cross-modal matching. The approach yields state-of-the-art results on MIMIC-CXR and IU-Xray across clinical-efficacy and natural language generation metrics, and ablations confirm the efficacy of the Abstractor, TCL, and the two-topic query design in improving rare-topic coverage and avoiding topic confusion. The work reports meaningful practical impact by producing more accurate and comprehensive radiology reports, particularly in describing rare but clinically critical findings. Overall, Teaser advances retrieval-based medical report generation by explicitly modeling topic-level distinctions and aligning image-derived queries with sentence galleries, enabling more dependable and fine-grained descriptions in clinical practice.
Abstract
Automated radiology reporting holds immense clinical potential in alleviating the burdensome workload of radiologists and mitigating diagnostic bias. Recently, retrieval-based report generation methods have garnered increasing attention due to their inherent advantages in terms of the quality and consistency of generated reports. However, due to the long-tail distribution of the training data, these models tend to learn frequently occurring sentences and topics, overlooking the rare topics. Regrettably, in many cases, the descriptions of rare topics often indicate critical findings that should be mentioned in the report. To address this problem, we introduce a Topicwise Separable Sentence Retrieval (Teaser) for medical report generation. To ensure comprehensive learning of both common and rare topics, we categorize queries into common and rare types to learn differentiated topics, and then propose Topic Contrastive Loss to effectively align topics and queries in the latent space. Moreover, we integrate an Abstractor module following the extraction of visual features, which aids the topic decoder in gaining a deeper understanding of the visual observational intent. Experiments on the MIMIC-CXR and IU X-ray datasets demonstrate that Teaser surpasses state-of-the-art models, while also validating its capability to effectively represent rare topics and establish more dependable correspondences between queries and topics.
