Table of Contents
Fetching ...

MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation

Md Zubair, Hao Zheng, Nussdorf Jonathan, Grayson W. Armstrong, Lucy Q. Shen, Gabriela Wilson, Yu Tian, Xingquan Zhu, Min Shi

TL;DR

MultiFair tackles modality learning bias and demographic fairness bias in multimodal medical classification by introducing dual-level gradient modulation that adjusts training signals at both modality and group levels. It combines a multi-head attention fusion of modality encoders with modality balancing and gradient-direction alignment, complemented by a fairness-aware modulation using differentiable surrogate AUC across demographic groups and an overall fairness loss in the total objective. The approach is theoretically analyzed under smoothness assumptions and empirically validated on two glaucoma-focused multimodal datasets (FairVision and FairCLIP), where it achieves higher AUC and ES-AUC than unimodal, fairness-aware, and balanced baselines while reducing subgroup disparities. Overall, MultiFair provides a unified, generalizable framework for balanced, fairness-aware multimodal learning in high-stakes medical settings and beyond, with potential extensions to incomplete modality scenarios and multi-class problems.

Abstract

Medical decision systems increasingly rely on data from multiple sources to ensure reliable and unbiased diagnosis. However, existing multimodal learning models fail to achieve this goal because they often ignore two critical challenges. First, various data modalities may learn unevenly, thereby converging to a model biased towards certain modalities. Second, the model may emphasize learning on certain demographic groups causing unfair performances. The two aspects can influence each other, as different data modalities may favor respective groups during optimization, leading to both imbalanced and unfair multimodal learning. This paper proposes a novel approach called MultiFair for multimodal medical classification, which addresses these challenges with a dual-level gradient modulation process. MultiFair dynamically modulates training gradients regarding the optimization direction and magnitude at both data modality and group levels. We conduct extensive experiments on two multimodal medical datasets with different demographic groups. The results show that MultiFair outperforms state-of-the-art multimodal learning and fairness learning methods.

MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation

TL;DR

MultiFair tackles modality learning bias and demographic fairness bias in multimodal medical classification by introducing dual-level gradient modulation that adjusts training signals at both modality and group levels. It combines a multi-head attention fusion of modality encoders with modality balancing and gradient-direction alignment, complemented by a fairness-aware modulation using differentiable surrogate AUC across demographic groups and an overall fairness loss in the total objective. The approach is theoretically analyzed under smoothness assumptions and empirically validated on two glaucoma-focused multimodal datasets (FairVision and FairCLIP), where it achieves higher AUC and ES-AUC than unimodal, fairness-aware, and balanced baselines while reducing subgroup disparities. Overall, MultiFair provides a unified, generalizable framework for balanced, fairness-aware multimodal learning in high-stakes medical settings and beyond, with potential extensions to incomplete modality scenarios and multi-class problems.

Abstract

Medical decision systems increasingly rely on data from multiple sources to ensure reliable and unbiased diagnosis. However, existing multimodal learning models fail to achieve this goal because they often ignore two critical challenges. First, various data modalities may learn unevenly, thereby converging to a model biased towards certain modalities. Second, the model may emphasize learning on certain demographic groups causing unfair performances. The two aspects can influence each other, as different data modalities may favor respective groups during optimization, leading to both imbalanced and unfair multimodal learning. This paper proposes a novel approach called MultiFair for multimodal medical classification, which addresses these challenges with a dual-level gradient modulation process. MultiFair dynamically modulates training gradients regarding the optimization direction and magnitude at both data modality and group levels. We conduct extensive experiments on two multimodal medical datasets with different demographic groups. The results show that MultiFair outperforms state-of-the-art multimodal learning and fairness learning methods.

Paper Structure

This paper contains 21 sections, 24 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The differences between existing and proposed multimodal learning paradigms. SLO: scanning laser ophthalmoscopy. OCT: optical coherence tomography.
  • Figure 2: The proposed MultiFair model. $X_1, X_2, \ldots, X_m$ represent the modalities. The features of individual encoders are fused by a multi-head attention fusion model for medical classification. Modality-specific classifiers' ($c_1, c_2, \ldots, c_m$) gradient direction and magnitudes, and group-based surrogate AUCs are determining the balancing factors ($B_1, B_2, \ldots, B_m$), fairness factor ($f^{\mathrm{batch}}$), and fairness gap ($F_G$). The task loss is integrated with the fairness gap ($F_G$), and the direction similarity between the fusion model and the classifiers ($\mathcal{L}_{gm}$).
  • Figure 3: Ablation study results. The performance of MultiFair$_G$ with fairness only, MultiFair$_M$ with modality only, and MultiFair with both modulation. Fig.\ref{['gender']} represents the performance for the gender subgroup (Male and Female) with and without fairness modulation. Fig. \ref{['race']} shows the corresponding performance for various racial subgroups.
  • Figure 4: The influence of different fairness parameters on performance, measured in terms of AUC, ES-AUC, Group AUCs (Male and Female). Fig. (a) represents the variation in AUCs across different fairness thresholds $(\tau)$, while Fig. (b) shows the effect of the fairness penalty $\lambda_f$. And Fig. (c) illustrates how the performance changes with the various values of fairness modulation strength $(\delta)$.
  • Figure 5: The impact of the modality balancing factors. Fig. (a) $\lambda_{gm}$ and Fig. (b) $\rho$ on the performance in terms of AUC and ES-AUC.