Debiasing surgeon: fantastic weights and how to find them

Rémi Nahon; Ivan Luiz De Moura Matos; Van-Tam Nguyen; Enzo Tartaglione

Debiasing surgeon: fantastic weights and how to find them

Rémi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen, Enzo Tartaglione

TL;DR

The paper tackles the problem of deep learning biases arising from spurious correlations and proposes Finding Fantastic Weights (FFW) to extract unbiased sub-networks from vanilla trained models without retraining. By appending a bias extractor and learning a gating mask on the encoder, FFW minimizes the leakage of bias information while preserving task accuracy, using a loss that combines task performance with an empirical mutual information term $\mathcal{I}(b, \hat{b})$. It presents a theoretical framework linking biasedness $\phi$ and task bias $K_{bia}$, and provides unstructured and structured pruning variants that guarantee reduced bias leakage under pruning. Empirical results across Biased MNIST, CelebA, Corrupted CIFAR10, and Multi-Color MNIST show that debiased sub-networks indeed exist in vanilla models, achieving competitive task performance with varying sparsity and often outperforming baselines on debiasing metrics, while highlighting that aggressive bias removal is not universally beneficial. The findings suggest a route to energy-efficient debiasing by leveraging architectural sparsity rather than heavy retraining, with implications for safety and regulatory compliance in AI systems.

Abstract

Nowadays an ever-growing concerning phenomenon, the emergence of algorithmic biases that can lead to unfair models, emerges. Several debiasing approaches have been proposed in the realm of deep learning, employing more or less sophisticated approaches to discourage these models from massively employing these biases. However, a question emerges: is this extra complexity really necessary? Is a vanilla-trained model already embodying some ``unbiased sub-networks'' that can be used in isolation and propose a solution without relying on the algorithmic biases? In this work, we show that such a sub-network typically exists, and can be extracted from a vanilla-trained model without requiring additional training. We further validate that such specific architecture is incapable of learning a specific bias, suggesting that there are possible architectural countermeasures to the problem of biases in deep neural networks.

Debiasing surgeon: fantastic weights and how to find them

TL;DR

. It presents a theoretical framework linking biasedness

and task bias

, and provides unstructured and structured pruning variants that guarantee reduced bias leakage under pruning. Empirical results across Biased MNIST, CelebA, Corrupted CIFAR10, and Multi-Color MNIST show that debiased sub-networks indeed exist in vanilla models, achieving competitive task performance with varying sparsity and often outperforming baselines on debiasing metrics, while highlighting that aggressive bias removal is not universally beneficial. The findings suggest a route to energy-efficient debiasing by leveraging architectural sparsity rather than heavy retraining, with implications for safety and regulatory compliance in AI systems.

Abstract

Paper Structure (26 sections, 21 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 21 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related works
Method
Removing the bias impacts the performance
Towards bias information removal
Finding the Fantastic Weights
Overview of FFW
Experiments
Experimental Setup
Datasets
Preliminary analysis
Main results
Importance of fitting the training set
Conclusion
Visualizations
...and 11 more sections

Figures (7)

Figure 1: Despite other debiasing approaches implying training or fine-tuning the whole model, with Finding Fantastic Weights (FFW) we maintain the model's parameters frozen and remove the sub-network responsible for bias information propagation.
Figure 2: The vanilla model, where the model still employs information related to the bias ($\phi\neq 0$) (a), the model where the information of the bias is entirely removed ($\phi=0$) (b), and the relationship between model biasedness $\phi$ and task biasedness $K_{\text{bia}}$ (plot obtained with $N=10$ and $\rho=0.9$) (c).
Figure 3: Grad-Cam visualization of the effects of FFW on Biased-MNIST with $\rho = 0.997$
Figure 4: Absolute number (top row) and proportions (bottom row) of pruned parameters after applying FFW to Biased MNIST, CelebA, and Corrupted CIFAR10.
Figure 5: Results for FFW applied on Biased-MNIST ($\rho= 0.99$) with different $\gamma$ on the validation set
...and 2 more figures

Debiasing surgeon: fantastic weights and how to find them

TL;DR

Abstract

Debiasing surgeon: fantastic weights and how to find them

Authors

TL;DR

Abstract

Table of Contents

Figures (7)