Fairness in Multi-modal Medical Diagnosis with Demonstration Selection

Dawei Li; Zijian Gu; Peng Wang; Chuhan Song; Zhen Tan; Mohan Zhang; Tianlong Chen; Yu Tian; Song Wang

Fairness in Multi-modal Medical Diagnosis with Demonstration Selection

Dawei Li, Zijian Gu, Peng Wang, Chuhan Song, Zhen Tan, Mohan Zhang, Tianlong Chen, Yu Tian, Song Wang

TL;DR

The paper addresses fairness gaps in medical image reasoning with multimodal language models and shows that standard in-context demonstration strategies propagate demographic bias. It introduces Fairness-Aware Demonstration Selection (FADS), a tuning-free framework that constructs demographically balanced and semantically relevant demonstrations via clustering-based data bias mitigation and balanced sampling. Through extensive experiments on FairCLIP Glaucoma and CheXpert Plus datasets with models like Qwen and LLaVA-Med, FADS consistently reduces gender-, race-, and ethnicity-related disparities while maintaining competitive accuracy, demonstrating a scalable path toward equitable medical image reasoning. Overall, the work highlights fairness-aware in-context learning as a practical, data-efficient approach for trustworthy and scalable medical AI without requiring model retraining.

Abstract

Multimodal large language models (MLLMs) have shown strong potential for medical image reasoning, yet fairness across demographic groups remains a major concern. Existing debiasing methods often rely on large labeled datasets or fine-tuning, which are impractical for foundation-scale models. We explore In-Context Learning (ICL) as a lightweight, tuning-free alternative for improving fairness. Through systematic analysis, we find that conventional demonstration selection (DS) strategies fail to ensure fairness due to demographic imbalance in selected exemplars. To address this, we propose Fairness-Aware Demonstration Selection (FADS), which builds demographically balanced and semantically relevant demonstrations via clustering-based sampling. Experiments on multiple medical imaging benchmarks show that FADS consistently reduces gender-, race-, and ethnicity-related disparities while maintaining strong accuracy, offering an efficient and scalable path toward fair medical image reasoning. These results highlight the potential of fairness-aware in-context learning as a scalable and data-efficient solution for equitable medical image reasoning.

Fairness in Multi-modal Medical Diagnosis with Demonstration Selection

TL;DR

Abstract

Fairness in Multi-modal Medical Diagnosis with Demonstration Selection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)