Table of Contents
Fetching ...

Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval

Demetrio Deanda, Yuktha Priya Masupalli, Jeong Yang, Young Lee, Zechun Cao, Gongbo Liang

TL;DR

The paper tackles robustness of cross-domain medical image-report retrieval under image occlusion. It benchmarks four contrastive models—$CLIP$, $CXR-RePaiR$, $MedCLIP$, and $CXR-CLIP$—using an occlusion-robustness task on the $MIMIC-CXR$ dataset and $Recall@k$ metrics. Findings show $CXR-CLIP$ and $CXR-RePaiR$ achieve the strongest retrieval performance across occlusion levels, while $CLIP$ struggles due to domain mismatch; robustness degrades nearly proportionally with occlusion, with $MedCLIP$ showing modest robustness but weaker overall performance. The work emphasizes the importance of domain-specific training and robustness-focused improvements to enable reliable cross-domain medical image-report retrieval in clinical settings.

Abstract

Medical images and reports offer invaluable insights into patient health. The heterogeneity and complexity of these data hinder effective analysis. To bridge this gap, we investigate contrastive learning models for cross-domain retrieval, which associates medical images with their corresponding clinical reports. This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP. We introduce an occlusion retrieval task to evaluate model performance under varying levels of image corruption. Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data, as evidenced by the proportional decrease in performance with increasing occlusion levels. While MedCLIP exhibits slightly more robustness, its overall performance remains significantly behind CXR-CLIP and CXR-RePaiR. CLIP, trained on a general-purpose dataset, struggles with medical image-report retrieval, highlighting the importance of domain-specific training data. The evaluation of this work suggests that more effort needs to be spent on improving the robustness of these models. By addressing these limitations, we can develop more reliable cross-domain retrieval models for medical applications.

Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval

TL;DR

The paper tackles robustness of cross-domain medical image-report retrieval under image occlusion. It benchmarks four contrastive models—, , , and —using an occlusion-robustness task on the dataset and metrics. Findings show and achieve the strongest retrieval performance across occlusion levels, while struggles due to domain mismatch; robustness degrades nearly proportionally with occlusion, with showing modest robustness but weaker overall performance. The work emphasizes the importance of domain-specific training and robustness-focused improvements to enable reliable cross-domain medical image-report retrieval in clinical settings.

Abstract

Medical images and reports offer invaluable insights into patient health. The heterogeneity and complexity of these data hinder effective analysis. To bridge this gap, we investigate contrastive learning models for cross-domain retrieval, which associates medical images with their corresponding clinical reports. This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP. We introduce an occlusion retrieval task to evaluate model performance under varying levels of image corruption. Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data, as evidenced by the proportional decrease in performance with increasing occlusion levels. While MedCLIP exhibits slightly more robustness, its overall performance remains significantly behind CXR-CLIP and CXR-RePaiR. CLIP, trained on a general-purpose dataset, struggles with medical image-report retrieval, highlighting the importance of domain-specific training data. The evaluation of this work suggests that more effort needs to be spent on improving the robustness of these models. By addressing these limitations, we can develop more reliable cross-domain retrieval models for medical applications.
Paper Structure (16 sections, 1 equation, 3 figures, 1 table)

This paper contains 16 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Example of a classification-based contrastive learning model.
  • Figure 2: Example of a chest x-ray (left) with the radiology report (right) from the MIMIC-CXR dataset johnson2019mimic.
  • Figure 3: Robustness testing result for CXR-RePaiR (top), CXR-CLIP (middle), and ViT-based MedCLIP (bottom).