Table of Contents
Fetching ...

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

Haibo Jin, Haoxuan Che, Yi Lin, Hao Chen

TL;DR

PromptMRG tackles the challenge of diagnostically accurate medical report generation under disease class imbalance by introducing diagnosis-driven prompts that convert classifier outputs into explicit guidance for the report decoder. It combines an encoder–decoder backbone with a disease classification branch, a cross-modal feature enhancement module that leverages CLIP-based report retrieval, and a self-adaptive learning strategy to balance learning across diseases. The key contributions are the token-based prompts for explicit diagnostic guidance, the cross-modal retrieval plus dynamic aggregation for robust classification, and the adaptive loss that improves performance on rare diseases, yielding state-of-the-art clinical efficacy on two chest X-ray benchmarks. The approach enhances diagnostic reliability while maintaining competitive natural language generation, offering practical benefits for clinical reporting workflows and potential applicability to other medical imaging modalities.

Abstract

Automatic medical report generation (MRG) is of great research value as it has the potential to relieve radiologists from the heavy burden of report writing. Despite recent advancements, accurate MRG remains challenging due to the need for precise clinical understanding and disease identification. Moreover, the imbalanced distribution of diseases makes the challenge even more pronounced, as rare diseases are underrepresented in training data, making their diagnostic performance unreliable. To address these challenges, we propose diagnosis-driven prompts for medical report generation (PromptMRG), a novel framework that aims to improve the diagnostic accuracy of MRG with the guidance of diagnosis-aware prompts. Specifically, PromptMRG is based on encoder-decoder architecture with an extra disease classification branch. When generating reports, the diagnostic results from the classification branch are converted into token prompts to explicitly guide the generation process. To further improve the diagnostic accuracy, we design cross-modal feature enhancement, which retrieves similar reports from the database to assist the diagnosis of a query image by leveraging the knowledge from a pre-trained CLIP. Moreover, the disease imbalanced issue is addressed by applying an adaptive logit-adjusted loss to the classification branch based on the individual learning status of each disease, which overcomes the barrier of text decoder's inability to manipulate disease distributions. Experiments on two MRG benchmarks show the effectiveness of the proposed method, where it obtains state-of-the-art clinical efficacy performance on both datasets. The code is available at https://github.com/jhb86253817/PromptMRG.

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

TL;DR

PromptMRG tackles the challenge of diagnostically accurate medical report generation under disease class imbalance by introducing diagnosis-driven prompts that convert classifier outputs into explicit guidance for the report decoder. It combines an encoder–decoder backbone with a disease classification branch, a cross-modal feature enhancement module that leverages CLIP-based report retrieval, and a self-adaptive learning strategy to balance learning across diseases. The key contributions are the token-based prompts for explicit diagnostic guidance, the cross-modal retrieval plus dynamic aggregation for robust classification, and the adaptive loss that improves performance on rare diseases, yielding state-of-the-art clinical efficacy on two chest X-ray benchmarks. The approach enhances diagnostic reliability while maintaining competitive natural language generation, offering practical benefits for clinical reporting workflows and potential applicability to other medical imaging modalities.

Abstract

Automatic medical report generation (MRG) is of great research value as it has the potential to relieve radiologists from the heavy burden of report writing. Despite recent advancements, accurate MRG remains challenging due to the need for precise clinical understanding and disease identification. Moreover, the imbalanced distribution of diseases makes the challenge even more pronounced, as rare diseases are underrepresented in training data, making their diagnostic performance unreliable. To address these challenges, we propose diagnosis-driven prompts for medical report generation (PromptMRG), a novel framework that aims to improve the diagnostic accuracy of MRG with the guidance of diagnosis-aware prompts. Specifically, PromptMRG is based on encoder-decoder architecture with an extra disease classification branch. When generating reports, the diagnostic results from the classification branch are converted into token prompts to explicitly guide the generation process. To further improve the diagnostic accuracy, we design cross-modal feature enhancement, which retrieves similar reports from the database to assist the diagnosis of a query image by leveraging the knowledge from a pre-trained CLIP. Moreover, the disease imbalanced issue is addressed by applying an adaptive logit-adjusted loss to the classification branch based on the individual learning status of each disease, which overcomes the barrier of text decoder's inability to manipulate disease distributions. Experiments on two MRG benchmarks show the effectiveness of the proposed method, where it obtains state-of-the-art clinical efficacy performance on both datasets. The code is available at https://github.com/jhb86253817/PromptMRG.
Paper Structure (28 sections, 8 equations, 10 figures, 5 tables)

This paper contains 28 sections, 8 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: (a) Comparison of two sample predictions. (b) F1 scores of five SOTA methods published in 2023, a vanilla classification model, and our proposed model, tested on MIMIC test set. (c) F1 scores of a vanilla MRG model over different diseases on MIMIC test, and diseases are sorted in ascending order of training numbers.
  • Figure 1: Analysis of the hyperparameter (a) $\lambda$ and (b) $k$ with respect to F1 and BLEU-4 on the MIMIC test.
  • Figure 2: The overall framework of PromptMRG, which mainly consists of an image encoder and a text decoder for report generation. The diagnosis-driven prompts module is proposed to guide the decoder for diagnostically correct reports. The cross-modal feature enhancement is designed to enhance the feature for disease classification via a report database. The self-adaptive disease-balanced learning is further proposed to handle the imbalanced performance among diseases.
  • Figure 2: Comparing the count of high-frequency phrases between the method with and without DDP on the MIMIC test.
  • Figure 3: An example prompt we used to query the label of Aorta from Vicuna-13B.
  • ...and 5 more figures