Table of Contents
Fetching ...

DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching

Xiaofei Huang, Wenting Chen, Jie Liu, Qisheng Lu, Xiaoling Luo, Linlin Shen

TL;DR

Medical report generation must bridge imaging interpretation with clinically valid narrative content. DAMPER presents a dual-stage framework that first aligns CXR features with MeSH terms using MeSH encoding, GAN-based MCA, and CMG to produce coarse radiological representations, then employs hypergraphs for intra- and inter-patient fine-grained alignment before decoding the final report. The approach yields superior METEOR and CE metrics on IU-Xray and MIMIC-CXR, with strong zero-shot generalization to MIMIC-ABN, demonstrating improved semantic fidelity and clinical relevance. By integrating MeSH knowledge and hypergraph-based high-order relationships, DAMPER closely mirrors the radiologist workflow and improves robustness to missing views.

Abstract

Medical report generation is crucial for clinical diagnosis and patient management, summarizing diagnoses and recommendations based on medical imaging. However, existing work often overlook the clinical pipeline involved in report writing, where physicians typically conduct an initial quick review followed by a detailed examination. Moreover, current alignment methods may lead to misaligned relationships. To address these issues, we propose DAMPER, a dual-stage framework for medical report generation that mimics the clinical pipeline of report writing in two stages. In the first stage, a MeSH-Guided Coarse-Grained Alignment (MCG) stage that aligns chest X-ray (CXR) image features with medical subject headings (MeSH) features to generate a rough keyphrase representation of the overall impression. In the second stage, a Hypergraph-Enhanced Fine-Grained Alignment (HFG) stage that constructs hypergraphs for image patches and report annotations, modeling high-order relationships within each modality and performing hypergraph matching to capture semantic correlations between image regions and textual phrases. Finally,the coarse-grained visual features, generated MeSH representations, and visual hypergraph features are fed into a report decoder to produce the final medical report. Extensive experiments on public datasets demonstrate the effectiveness of DAMPER in generating comprehensive and accurate medical reports, outperforming state-of-the-art methods across various evaluation metrics.

DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching

TL;DR

Medical report generation must bridge imaging interpretation with clinically valid narrative content. DAMPER presents a dual-stage framework that first aligns CXR features with MeSH terms using MeSH encoding, GAN-based MCA, and CMG to produce coarse radiological representations, then employs hypergraphs for intra- and inter-patient fine-grained alignment before decoding the final report. The approach yields superior METEOR and CE metrics on IU-Xray and MIMIC-CXR, with strong zero-shot generalization to MIMIC-ABN, demonstrating improved semantic fidelity and clinical relevance. By integrating MeSH knowledge and hypergraph-based high-order relationships, DAMPER closely mirrors the radiologist workflow and improves robustness to missing views.

Abstract

Medical report generation is crucial for clinical diagnosis and patient management, summarizing diagnoses and recommendations based on medical imaging. However, existing work often overlook the clinical pipeline involved in report writing, where physicians typically conduct an initial quick review followed by a detailed examination. Moreover, current alignment methods may lead to misaligned relationships. To address these issues, we propose DAMPER, a dual-stage framework for medical report generation that mimics the clinical pipeline of report writing in two stages. In the first stage, a MeSH-Guided Coarse-Grained Alignment (MCG) stage that aligns chest X-ray (CXR) image features with medical subject headings (MeSH) features to generate a rough keyphrase representation of the overall impression. In the second stage, a Hypergraph-Enhanced Fine-Grained Alignment (HFG) stage that constructs hypergraphs for image patches and report annotations, modeling high-order relationships within each modality and performing hypergraph matching to capture semantic correlations between image regions and textual phrases. Finally,the coarse-grained visual features, generated MeSH representations, and visual hypergraph features are fed into a report decoder to produce the final medical report. Extensive experiments on public datasets demonstrate the effectiveness of DAMPER in generating comprehensive and accurate medical reports, outperforming state-of-the-art methods across various evaluation metrics.

Paper Structure

This paper contains 34 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: (a) Represents the previous research works on fine-grained alignment. (b) Our research work on fine-grained alignment in this paper. (c) The motivation for our approach: Mimicking the real process of physicians writing reports.
  • Figure 2: Overview of the proposed DAMPER framework. In the MCG stage, the MCA module extracts coarse-grained visual features aligned with MeSH terms, which are then input into the CMG. The HFG stage includes Intra-RCA and Inter-RCA modules that acquire detailed information corresponding to the report. This information, along with the output from MCA and CMG, is fed into the report decoder to generate the final medical report.
  • Figure 3: Visualization of report generation examples includes the input image in the first column, followed by the corresponding MeSH and report in the 2nd and 3rd columns. The 4th column shows the MeSH generated by DAMPER, while the 5th column backward shows reports from various models. MeSH are highlighted in red or green, with related sentences in the generated reports marked accordingly.
  • Figure 4: The t-SNE Visualization of Visual Features and MeSH in the MCA Module
  • Figure 5: Visualization of alignment examples using hypergraph matching. Text phrases in the report and image regions, color-coded similarly, represent alignments established by hypergraph matching. Different colors distinguish multiple alignment relationships.