Fairness-aware Vision Transformer via Debiased Self-Attention

Yao Qiang; Chengyin Li; Prashant Khanduri; Dongxiao Zhu

Fairness-aware Vision Transformer via Debiased Self-Attention

Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu

TL;DR

This work addresses fairness in Vision Transformers (ViT) by identifying the challenge that CNN-oriented debiasing methods do not transfer well to ViT. It introduces Debiased Self-Attention (DSA), a hierarchical, two-step framework that first uses a bias-only model and adversarial patch attacks to locate and perturb spurious features, then trains a debiased ViT with an attention-weights alignment regularizer on augmented data to learn real predictive features. The training objective combines dual cross-entropy losses on clean and adversarial inputs with a discrepancy-based attention alignment term, controlled by tunable weights $\lambda_1, \lambda_2, \lambda_3$, to balance fairness and accuracy. Empirical results on Waterbirds, CelebA, and bFFHQ show that DSA consistently improves group fairness (lower EO, DP, DBA) while maintaining or enhancing accuracy, with qualitative analyses confirming reduced reliance on spurious cues in attention maps. The work provides code at the referenced repository, highlighting its practical applicability for debiasing ViTs in real-world vision tasks.

Abstract

Vision Transformer (ViT) has recently gained significant attention in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the attention mechanism. Whereas recent works have explored the trustworthiness of ViT, including its robustness and explainability, the issue of fairness has not yet been adequately addressed. We establish that the existing fairness-aware algorithms designed for CNNs do not perform well on ViT, which highlights the need to develop our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive label for bias mitigation and simultaneously retain real features for target prediction. Notably, DSA leverages adversarial examples to locate and mask the spurious features in the input image patches with an additional attention weights alignment regularizer in the training objective to encourage learning real features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. Code is available at \href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}.

Fairness-aware Vision Transformer via Debiased Self-Attention

TL;DR

, to balance fairness and accuracy. Empirical results on Waterbirds, CelebA, and bFFHQ show that DSA consistently improves group fairness (lower EO, DP, DBA) while maintaining or enhancing accuracy, with qualitative analyses confirming reduced reliance on spurious cues in attention maps. The work provides code at the referenced repository, highlighting its practical applicability for debiasing ViTs in real-world vision tasks.

Abstract

Paper Structure (19 sections, 6 equations, 3 figures, 5 tables)

This paper contains 19 sections, 6 equations, 3 figures, 5 tables.

Introduction
Related Work
ViT for Image Classification
Fairness and Debiased Learning
Fairness in ViT
Preliminaries
Debiased Self-Attention (DSA) Framework
Training the Bias-only Model
Adversarial Attack Against the Bias-only Model
Attention Weights Alignment
Overall Training Objective
Experimental Settings
Results and Discussion
Fairness and Accuracy Evaluations
Ablating DSA
...and 4 more sections

Figures (3)

Figure 1: An illustration example. The prediction target label is Hair Color and the sensitive label is Gender. The heatmap of attention weights shows that the Vanilla ViT uses spurious features, e.g., 'red lip' and 'eye shadow', whereas the fairness-aware ViT via our DSA leverages the real features, e.g., 'hair', for target prediction.
Figure 2: The DSA framework. The target label is Hair Color and the sensitive label is Gender. The bias-only model is first trained to learn the spurious features (the green patches) for predicting sensitive label $s$ but not to learn the real features (the red patches) with an adversarial objective. The adversarial attack is then applied against the bias-only model to generate the adversarial examples $x^\prime$, by perturbing the spurious features (the grid shadow patches) of the original inputs $x$ (see Section \ref{['Sec: AT']}). Finally, both $x$ and $x^\prime$ are used to train a fairness-aware ViT with an attention weights alignment objective (see Eq. \ref{['eq:celoss']}) and learn the real features (the red patches) (see Sections \ref{['Sec: AWA']} and \ref{['Sec: Oveall_Loss']}).
Figure 3: Qualitative evaluation. Y: Hair Color, S: Gender.

Fairness-aware Vision Transformer via Debiased Self-Attention

TL;DR

Abstract

Fairness-aware Vision Transformer via Debiased Self-Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (3)