Table of Contents
Fetching ...

Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing

Massimiliano Ciranni, Vito Paolo Pastore, Roberto Di Via, Enzo Tartaglione, Francesca Odone, Vittorio Murino

TL;DR

Diffusing DeBias (DDB) tackles the challenge of spurious biases in image classification by leveraging conditional diffusion models to synthesize bias-aligned data per class. A Bias Amplifier trained on this synthetic data provides reliable supervisory signals, which are then integrated into two debiasing recipes (two-step and end-to-end) to produce robust debiased classifiers. Across six biased datasets, DDB achieves state-of-the-art unsupervised debiasing, demonstrating strong generalization and resilience when biases are absent. The approach offers a versatile plug-in for existing debiasing methods, albeit with high diffusion-model training costs, and shows promise for scalable, bias-aware learning in real-world settings.

Abstract

Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between specific attributes and target labels. This results in a form of bias affecting training data, which typically leads to unrecoverable weak generalization in prediction. This paper aims at facing this problem by leveraging bias amplification with generated synthetic data: we introduce Diffusing DeBias (DDB), a novel approach acting as a plug-in for common methods of unsupervised model debiasing exploiting the inherent bias-learning tendency of diffusion models in data generation. Specifically, our approach adopts conditional diffusion models to generate synthetic bias-aligned images, which replace the original training set for learning an effective bias amplifier model that we subsequently incorporate into an end-to-end and a two-step unsupervised debiasing approach. By tackling the fundamental issue of bias-conflicting training samples memorization in learning auxiliary models, typical of this type of techniques, our proposed method beats current state-of-the-art in multiple benchmark datasets, demonstrating its potential as a versatile and effective tool for tackling bias in deep learning models. Code is available at https://github.com/Malga-Vision/DiffusingDeBias

Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing

TL;DR

Diffusing DeBias (DDB) tackles the challenge of spurious biases in image classification by leveraging conditional diffusion models to synthesize bias-aligned data per class. A Bias Amplifier trained on this synthetic data provides reliable supervisory signals, which are then integrated into two debiasing recipes (two-step and end-to-end) to produce robust debiased classifiers. Across six biased datasets, DDB achieves state-of-the-art unsupervised debiasing, demonstrating strong generalization and resilience when biases are absent. The approach offers a versatile plug-in for existing debiasing methods, albeit with high diffusion-model training costs, and shows promise for scalable, bias-aware learning in real-world settings.

Abstract

Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between specific attributes and target labels. This results in a form of bias affecting training data, which typically leads to unrecoverable weak generalization in prediction. This paper aims at facing this problem by leveraging bias amplification with generated synthetic data: we introduce Diffusing DeBias (DDB), a novel approach acting as a plug-in for common methods of unsupervised model debiasing exploiting the inherent bias-learning tendency of diffusion models in data generation. Specifically, our approach adopts conditional diffusion models to generate synthetic bias-aligned images, which replace the original training set for learning an effective bias amplifier model that we subsequently incorporate into an end-to-end and a two-step unsupervised debiasing approach. By tackling the fundamental issue of bias-conflicting training samples memorization in learning auxiliary models, typical of this type of techniques, our proposed method beats current state-of-the-art in multiple benchmark datasets, demonstrating its potential as a versatile and effective tool for tackling bias in deep learning models. Code is available at https://github.com/Malga-Vision/DiffusingDeBias

Paper Structure

This paper contains 30 sections, 3 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Schematic representation of our DDB framework. The debiasing process consists of two key steps: (A) Diffusing the Bias uses a conditional diffusion model with classifier-free guidance to generate synthetic images that preserve training dataset biases, and (B) employs a Bias Amplifier firstly trained on such synthetic data, and subsequently used during inference to extract supervisory bias signals from real images. These signals are used to guide the training process of a target debiased model by designing two debiasing recipes (i.e., 2-step and end-to-end methods).
  • Figure 2: Comparison of two debiasing strategies: Recipe I and Recipe II.
  • Figure 3: Examples of synthetic bias-aligned images across multiple datasets. Each grid shows synthetic images for specific classes across biased datasets, revealing how the model aligns with dataset-specific biases in different contexts.
  • Figure 4: Synthetic bias-aligned image generations for each class of the Waterbirds dataset. Each grid displays 100 synthetic images per class, highlighting model bias alignment.
  • Figure 5: Synthetic bias-aligned image generations for each class of the UrbanCars dataset. Each grid displays 100 synthetic images per class, highlighting model bias alignment.
  • ...and 4 more figures