Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu
TL;DR
This work addresses fairness in Vision Transformers (ViT) by identifying the challenge that CNN-oriented debiasing methods do not transfer well to ViT. It introduces Debiased Self-Attention (DSA), a hierarchical, two-step framework that first uses a bias-only model and adversarial patch attacks to locate and perturb spurious features, then trains a debiased ViT with an attention-weights alignment regularizer on augmented data to learn real predictive features. The training objective combines dual cross-entropy losses on clean and adversarial inputs with a discrepancy-based attention alignment term, controlled by tunable weights $\lambda_1, \lambda_2, \lambda_3$, to balance fairness and accuracy. Empirical results on Waterbirds, CelebA, and bFFHQ show that DSA consistently improves group fairness (lower EO, DP, DBA) while maintaining or enhancing accuracy, with qualitative analyses confirming reduced reliance on spurious cues in attention maps. The work provides code at the referenced repository, highlighting its practical applicability for debiasing ViTs in real-world vision tasks.
Abstract
Vision Transformer (ViT) has recently gained significant attention in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the attention mechanism. Whereas recent works have explored the trustworthiness of ViT, including its robustness and explainability, the issue of fairness has not yet been adequately addressed. We establish that the existing fairness-aware algorithms designed for CNNs do not perform well on ViT, which highlights the need to develop our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive label for bias mitigation and simultaneously retain real features for target prediction. Notably, DSA leverages adversarial examples to locate and mask the spurious features in the input image patches with an additional attention weights alignment regularizer in the training objective to encourage learning real features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. Code is available at \href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}.
