Table of Contents
Fetching ...

BAdd: Bias Mitigation through Bias Addition

Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou

TL;DR

BAdd addressing bias in CV data introduces bias-capturing features into the model backbone to decouple learning from protected attributes. By adding the bias representation to the penultimate layer during training and then fine-tuning with only the original features, BAdd yields fairer representations and avoids loss spikes that plague vanilla training. Across seven benchmarks, including single- and multi-attribute biases, BAdd consistently outperforms state-of-the-art methods, achieving notable gains such as +27.5% on FB-Biased-MNIST and +5.5% on CelebA, while remaining architecture-agnostic. This approach offers a practical, scalable path to robust bias mitigation in real-world CV datasets, with the main limitation being the need for protected-attribute labels during training.

Abstract

Computer vision (CV) datasets often exhibit biases that are perpetuated by deep learning models. While recent efforts aim to mitigate these biases and foster fair representations, they fail in complex real-world scenarios. In particular, existing methods excel in controlled experiments involving benchmarks with single-attribute injected biases, but struggle with multi-attribute biases being present in well-established CV datasets. Here, we introduce BAdd, a simple yet effective method that allows for learning fair representations invariant to the attributes introducing bias by incorporating features representing these attributes into the backbone. BAdd is evaluated on seven benchmarks and exhibits competitive performance, surpassing state-of-the-art methods on both single- and multi-attribute benchmarks. Notably, BAdd achieves +27.5% and +5.5% absolute accuracy improvements on the challenging multi-attribute benchmarks, FB-Biased-MNIST and CelebA, respectively.

BAdd: Bias Mitigation through Bias Addition

TL;DR

BAdd addressing bias in CV data introduces bias-capturing features into the model backbone to decouple learning from protected attributes. By adding the bias representation to the penultimate layer during training and then fine-tuning with only the original features, BAdd yields fairer representations and avoids loss spikes that plague vanilla training. Across seven benchmarks, including single- and multi-attribute biases, BAdd consistently outperforms state-of-the-art methods, achieving notable gains such as +27.5% on FB-Biased-MNIST and +5.5% on CelebA, while remaining architecture-agnostic. This approach offers a practical, scalable path to robust bias mitigation in real-world CV datasets, with the main limitation being the need for protected-attribute labels during training.

Abstract

Computer vision (CV) datasets often exhibit biases that are perpetuated by deep learning models. While recent efforts aim to mitigate these biases and foster fair representations, they fail in complex real-world scenarios. In particular, existing methods excel in controlled experiments involving benchmarks with single-attribute injected biases, but struggle with multi-attribute biases being present in well-established CV datasets. Here, we introduce BAdd, a simple yet effective method that allows for learning fair representations invariant to the attributes introducing bias by incorporating features representing these attributes into the backbone. BAdd is evaluated on seven benchmarks and exhibits competitive performance, surpassing state-of-the-art methods on both single- and multi-attribute benchmarks. Notably, BAdd achieves +27.5% and +5.5% absolute accuracy improvements on the challenging multi-attribute benchmarks, FB-Biased-MNIST and CelebA, respectively.
Paper Structure (18 sections, 6 equations, 3 figures, 12 tables)

This paper contains 18 sections, 6 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: During training on Biased-MNIST, where color-digit association is strong, a vanilla model struggles with bias mitigation as reducing reliance on the protected attribute increases loss for biased samples. Augmenting main features ($\mathbf{h}$) with protected attribute features ($\mathbf{b}$) allows for learning optimal filters without compromising loss. Consequently, the model learns fair representations of the digits, independent of color, as evidenced by the activation maps and mean activations on the background of samples, where bias occurs.
  • Figure 2: Vanilla vs BAdd : Bias-aligned samples loss on Biased-MNIST dataset.
  • Figure 3: Comparison of mean biased filter activation values and classification error between Vanilla and BAdd on Biased-MNIST.