Table of Contents
Fetching ...

Debiasify: Self-Distillation for Unsupervised Bias Mitigation

Nourhan Bayasi, Jamil Fayyad, Ghassan Hamarneh, Rafeef Garbi, Homayoun Najjaran

TL;DR

This work presents Debiasify, a novel self-distillation approach that works without any prior information about the nature of biases, significantly outperforming the previous unsupervised debiasing methods while achieving comparable or superior performance to supervised methods.

Abstract

Simplicity bias poses a significant challenge in neural networks, often leading models to favor simpler solutions and inadvertently learn decision rules influenced by spurious correlations. This results in biased models with diminished generalizability. While many current approaches depend on human supervision, obtaining annotations for various bias attributes is often impractical. To address this, we introduce Debiasify, a novel self-distillation approach that requires no prior knowledge about the nature of biases. Our method leverages a new distillation loss to transfer knowledge within the network, from deeper layers containing complex, highly-predictive features to shallower layers with simpler, attribute-conditioned features in an unsupervised manner. This enables Debiasify to learn robust, debiased representations that generalize effectively across diverse biases and datasets, improving both worst-group performance and overall accuracy. Extensive experiments on computer vision and medical imaging benchmarks demonstrate the effectiveness of our approach, significantly outperforming previous unsupervised debiasing methods (e.g., a 10.13% improvement in worst-group accuracy for Wavy Hair classification in CelebA) and achieving comparable or superior performance to supervised approaches. Our code is publicly available at the following link: Debiasify.

Debiasify: Self-Distillation for Unsupervised Bias Mitigation

TL;DR

This work presents Debiasify, a novel self-distillation approach that works without any prior information about the nature of biases, significantly outperforming the previous unsupervised debiasing methods while achieving comparable or superior performance to supervised methods.

Abstract

Simplicity bias poses a significant challenge in neural networks, often leading models to favor simpler solutions and inadvertently learn decision rules influenced by spurious correlations. This results in biased models with diminished generalizability. While many current approaches depend on human supervision, obtaining annotations for various bias attributes is often impractical. To address this, we introduce Debiasify, a novel self-distillation approach that requires no prior knowledge about the nature of biases. Our method leverages a new distillation loss to transfer knowledge within the network, from deeper layers containing complex, highly-predictive features to shallower layers with simpler, attribute-conditioned features in an unsupervised manner. This enables Debiasify to learn robust, debiased representations that generalize effectively across diverse biases and datasets, improving both worst-group performance and overall accuracy. Extensive experiments on computer vision and medical imaging benchmarks demonstrate the effectiveness of our approach, significantly outperforming previous unsupervised debiasing methods (e.g., a 10.13% improvement in worst-group accuracy for Wavy Hair classification in CelebA) and achieving comparable or superior performance to supervised approaches. Our code is publicly available at the following link: Debiasify.

Paper Structure

This paper contains 15 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Debiasify leverages clustering in the feature space of a shallow layer in the network to identify attribute-conditioned groups (3 groups shown) for each class (e.g., Attractive), where images in each group are clustered based on common, non-target bias attributes (e.g., Female, Smiling, etc.)
  • Figure 2: Debiasify identifies attribute-conditioned groups $P_{a_{k},y}$ (represented by for all $a_k$ attributes, $k$ = 1, 2, 3) found through clustering in feature space of a shallow network layer. The goal is to bring their distributions closer to each other while aligning them with their class distribution $P_{y}$ (represented by ) in the deep layer using a novel self-distillation loss $\mathcal{L}_{AKD}$ (yellow arrows).
  • Figure 3: Results of Linear Decodability (LD): Panel (a) compares LD of the Male bias attribute from a frozen baseline network (blue) and our method (orange), both pretrained on different target attributes. Panel (b) shows LD of multiple bias attributes from networks pretrained on Blond Hair, comparing the baseline (blue) and our method (orange).
  • Figure 4: The t-SNE plots of feature embeddings for the baseline model (left) and our model (right) trained to classify Blond Hair. The plots display the distribution of samples with the target value Blond Hair = False. Blue and green colors represent female and male genders, respectively. Our Debiasify promotes a better mix of samples with the same target but different bias attribute values, which reduces the bias.
  • Figure 5: Visualization of the class activation maps generated by GradCAM for the baseline and Debiasify (ours) on images from the CelebA (left), Waterbirds (middle), and Fitzpatrick (right) datasets.