Table of Contents
Fetching ...

Looking at Model Debiasing through the Lens of Anomaly Detection

Vito Paolo Pastore, Massimiliano Ciranni, Davide Marinelli, Francesca Odone, Vittorio Murino

TL;DR

This work tackles the problem of neural network bias due to spurious data correlations by reframing bias detection as an anomaly-detection task in the feature space of a deliberately biased model. It introduces MoDAD, a two-step framework that first identifies bias-conflicting and bias-aligned samples with per-class One-Class SVMs aided by a class-specific threshold and a GCE loss to amplify distribution shifts, followed by bias-conflicting upsampling and augmentation to fine-tune the biased model. The method achieves competitive results on synthetic Corrupted CIFAR-10 and state-of-the-art-like performance on realistic datasets (BAR, BFFHQ, Waterbirds), demonstrating that precise bias identification is a key driver of debiasing success. Overall, the study shows that anomaly detection can bridge debiasing and anomaly detection research, enabling effective bias mitigation with a relatively simple technique and providing a new avenue for robust generalization under data bias.

Abstract

It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined. Source code is available at https://github.com/Malga-Vision/MoDAD

Looking at Model Debiasing through the Lens of Anomaly Detection

TL;DR

This work tackles the problem of neural network bias due to spurious data correlations by reframing bias detection as an anomaly-detection task in the feature space of a deliberately biased model. It introduces MoDAD, a two-step framework that first identifies bias-conflicting and bias-aligned samples with per-class One-Class SVMs aided by a class-specific threshold and a GCE loss to amplify distribution shifts, followed by bias-conflicting upsampling and augmentation to fine-tune the biased model. The method achieves competitive results on synthetic Corrupted CIFAR-10 and state-of-the-art-like performance on realistic datasets (BAR, BFFHQ, Waterbirds), demonstrating that precise bias identification is a key driver of debiasing success. Overall, the study shows that anomaly detection can bridge debiasing and anomaly detection research, enabling effective bias mitigation with a relatively simple technique and providing a new avenue for robust generalization under data bias.

Abstract

It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined. Source code is available at https://github.com/Malga-Vision/MoDAD
Paper Structure (18 sections, 1 equation, 7 figures, 7 tables)

This paper contains 18 sections, 1 equation, 7 figures, 7 tables.

Figures (7)

  • Figure 1: First two PC visualization of bias-aligned (green dots) and bias-conflicting (red dots) ResNet-18 test features from the Cat class in Corrupted CIFAR-10.
  • Figure 2: Schematic representation of our two-step method (MoDAD).
  • Figure 3: Example of bias-aligned (top) and bias-conflicting samples (bottom) from Corrupted CIFAR-10 Dataset.
  • Figure 4: Example of bias-aligned (top) and bias-conflicting samples (bottom) from BAR dataset.
  • Figure 5: Example of bias-aligned (top) and bias-conflicting samples (bottom) from BFFHQ dataset.
  • ...and 2 more figures