Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Wenting Chen; Linlin Shen; Jingyang Lin; Jiebo Luo; Xiang Li; Yixuan Yuan

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Wenting Chen, Linlin Shen, Jingyang Lin, Jiebo Luo, Xiang Li, Yixuan Yuan

TL;DR

An AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic) employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation.

Abstract

To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce the Adaptive Patch extraction (AdaPatch) module to acquire the adaptive patches for these regions adaptively. In order to provide explicit explainability for CXR-report generation task, we propose an AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets prove the effectiveness of our method and its superior performance to existing methods.

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 11 figures, 11 tables)

This paper contains 21 sections, 6 equations, 11 figures, 11 tables.

Introduction
Related Works
Fine-Grained Vision-Language Models
CXR-Report Generation
Methods
Adaptive Patch-Word Matching (AdaMatch)
AdaMatch-based LLM for Cyclic CXR-Report Generation (AdaMatch-Cyclic)
CXR-to-Report Generation
Report-to-CXR Generation
Overall Objective
Experiments
Experiment Setting
Comparison with State-of-the-Arts.
CXR-to-Report Generation
Report-to-CXR Generation
...and 6 more sections

Figures (11)

Figure 1: Current vision-language models (VLM) achieve (a) global alignment and (b) local alignment by matching overall visual with textual features, and aligning patches with word features, respectively. (c) To exploit the relation between textual words and abnormal patches with varied sizes, our AdaMatch obtains adaptive patch features and aligns them with word features.
Figure 2: The overview of the proposed methods. (a) Adaptive patch-word Matching (AdaMatch) model. (b) AdaMatch-based bidirectional large language model (LLM) for cyclic CXR-report generation (AdaMatch-Cyclic).
Figure 3: The example of instruction data for CXR-to-report generation.
Figure 4: The example of instruction data for report-to-CXR generation.
Figure 5: Qualitative comparison of CXR-to-report generation on the MIMIC-CXR (1st row) and the OpenI (2nd row) datasets, highlighting similar meanings in colored text. The keywords are obtained from AdaMatch.
...and 6 more figures

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

TL;DR

Abstract

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)