BAdd: Bias Mitigation through Bias Addition
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou
TL;DR
BAdd addressing bias in CV data introduces bias-capturing features into the model backbone to decouple learning from protected attributes. By adding the bias representation to the penultimate layer during training and then fine-tuning with only the original features, BAdd yields fairer representations and avoids loss spikes that plague vanilla training. Across seven benchmarks, including single- and multi-attribute biases, BAdd consistently outperforms state-of-the-art methods, achieving notable gains such as +27.5% on FB-Biased-MNIST and +5.5% on CelebA, while remaining architecture-agnostic. This approach offers a practical, scalable path to robust bias mitigation in real-world CV datasets, with the main limitation being the need for protected-attribute labels during training.
Abstract
Computer vision (CV) datasets often exhibit biases that are perpetuated by deep learning models. While recent efforts aim to mitigate these biases and foster fair representations, they fail in complex real-world scenarios. In particular, existing methods excel in controlled experiments involving benchmarks with single-attribute injected biases, but struggle with multi-attribute biases being present in well-established CV datasets. Here, we introduce BAdd, a simple yet effective method that allows for learning fair representations invariant to the attributes introducing bias by incorporating features representing these attributes into the backbone. BAdd is evaluated on seven benchmarks and exhibits competitive performance, surpassing state-of-the-art methods on both single- and multi-attribute benchmarks. Notably, BAdd achieves +27.5% and +5.5% absolute accuracy improvements on the challenging multi-attribute benchmarks, FB-Biased-MNIST and CelebA, respectively.
