Table of Contents
Fetching ...

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Wenting Chen, Linlin Shen, Jingyang Lin, Jiebo Luo, Xiang Li, Yixuan Yuan

TL;DR

An AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic) employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation.

Abstract

To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce the Adaptive Patch extraction (AdaPatch) module to acquire the adaptive patches for these regions adaptively. In order to provide explicit explainability for CXR-report generation task, we propose an AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets prove the effectiveness of our method and its superior performance to existing methods.

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

TL;DR

An AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic) employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation.

Abstract

To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce the Adaptive Patch extraction (AdaPatch) module to acquire the adaptive patches for these regions adaptively. In order to provide explicit explainability for CXR-report generation task, we propose an AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets prove the effectiveness of our method and its superior performance to existing methods.
Paper Structure (21 sections, 6 equations, 11 figures, 11 tables)

This paper contains 21 sections, 6 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Current vision-language models (VLM) achieve (a) global alignment and (b) local alignment by matching overall visual with textual features, and aligning patches with word features, respectively. (c) To exploit the relation between textual words and abnormal patches with varied sizes, our AdaMatch obtains adaptive patch features and aligns them with word features.
  • Figure 2: The overview of the proposed methods. (a) Adaptive patch-word Matching (AdaMatch) model. (b) AdaMatch-based bidirectional large language model (LLM) for cyclic CXR-report generation (AdaMatch-Cyclic).
  • Figure 3: The example of instruction data for CXR-to-report generation.
  • Figure 4: The example of instruction data for report-to-CXR generation.
  • Figure 5: Qualitative comparison of CXR-to-report generation on the MIMIC-CXR (1st row) and the OpenI (2nd row) datasets, highlighting similar meanings in colored text. The keywords are obtained from AdaMatch.
  • ...and 6 more figures