Table of Contents
Fetching ...

AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays

Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang

TL;DR

This work tackles demographic biases in CLIP-based chest X-ray diagnostics by introducing AdFair-CLIP, an adversarial fairness framework that operates in feature space to suppress sensitive attributes while preserving multimodal alignment. The method uses a minimax objective that couples the contrastive language-image loss with a fairness discriminator, encouraging representations to be invariant to race and gender. Through extensive zero-shot, few-shot, and transfer experiments on CheXpert Plus and MIMIC-CXR, and evaluation on a demographically balanced FCXP 5x90 test set, AdFair-CLIP demonstrates substantial fairness improvements with competitive diagnostic performance. The study also conducts a systematic fairness investigation of SOTA CLIP models, provides thorough ablations, and establishes a new benchmark for fairness-aware medical CLIP models in chest X-ray analysis, with implications for equitable clinical AI deployment.

Abstract

Contrastive Language-Image Pre-training (CLIP) models have demonstrated superior performance across various visual tasks including medical image classification. However, fairness concerns, including demographic biases, have received limited attention for CLIP models. This oversight leads to critical issues, particularly those related to race and gender, resulting in disparities in diagnostic outcomes and reduced reliability for underrepresented groups. To address these challenges, we introduce AdFair-CLIP, a novel framework employing adversarial feature intervention to suppress sensitive attributes, thereby mitigating spurious correlations and improving prediction fairness. We conduct comprehensive experiments on chest X-ray (CXR) datasets, and show that AdFair-CLIP significantly enhances both fairness and diagnostic accuracy, while maintaining robust generalization in zero-shot and few-shot scenarios. These results establish new benchmarks for fairness-aware learning in CLIP-based medical diagnostic models, particularly for CXR analysis.

AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays

TL;DR

This work tackles demographic biases in CLIP-based chest X-ray diagnostics by introducing AdFair-CLIP, an adversarial fairness framework that operates in feature space to suppress sensitive attributes while preserving multimodal alignment. The method uses a minimax objective that couples the contrastive language-image loss with a fairness discriminator, encouraging representations to be invariant to race and gender. Through extensive zero-shot, few-shot, and transfer experiments on CheXpert Plus and MIMIC-CXR, and evaluation on a demographically balanced FCXP 5x90 test set, AdFair-CLIP demonstrates substantial fairness improvements with competitive diagnostic performance. The study also conducts a systematic fairness investigation of SOTA CLIP models, provides thorough ablations, and establishes a new benchmark for fairness-aware medical CLIP models in chest X-ray analysis, with implications for equitable clinical AI deployment.

Abstract

Contrastive Language-Image Pre-training (CLIP) models have demonstrated superior performance across various visual tasks including medical image classification. However, fairness concerns, including demographic biases, have received limited attention for CLIP models. This oversight leads to critical issues, particularly those related to race and gender, resulting in disparities in diagnostic outcomes and reduced reliability for underrepresented groups. To address these challenges, we introduce AdFair-CLIP, a novel framework employing adversarial feature intervention to suppress sensitive attributes, thereby mitigating spurious correlations and improving prediction fairness. We conduct comprehensive experiments on chest X-ray (CXR) datasets, and show that AdFair-CLIP significantly enhances both fairness and diagnostic accuracy, while maintaining robust generalization in zero-shot and few-shot scenarios. These results establish new benchmarks for fairness-aware learning in CLIP-based medical diagnostic models, particularly for CXR analysis.

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the AdFair-CLIP architecture. Image and text encoders extract representations $h_v$ and $h_u$ from chest X-ray images and radiology reports, projecting them to $v$ and $u$ for contrastive alignment. A discriminator predicts sensitive attributes from concatenated representations, enabling adversarial training to mitigate biases.
  • Figure 2: Fairness assessment of SOTA CLIP-based CXR diagnostic methods across race and gender in three scenarios, with all scores presented in percentage.
  • Figure 3: Performance assessment of SOTA CLIP-based CXR diagnostic methods across race and gender in three scenarios, with all scores presented in percentage.