Table of Contents
Fetching ...

Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification

Haohua Dong, Ana Manzano Rodríguez, Camille Guinaudeau, Shin'ichi Satoh

TL;DR

This paper tackles bias in face gender classification arising from unbalanced training data and labels a demographic attribute-free setting. It introduces pseudo-balancing, a lightweight strategy that enforces demographic parity during pseudo-label selection in semi-supervised learning, using unlabeled data from a race-balanced source like FairFace. Across two scenarios, the method improves overall accuracy and substantially reduces gender disparities on the All-Age-Faces benchmark, notably narrowing East Asian subgroup gaps, while avoiding explicit demographic annotations. The work demonstrates the practicality of leveraging balanced unlabeled data to debias computer vision models and outlines limitations under severe data skew, offering directions for future enhancements and broader application.

Abstract

Face gender classification models often reflect and amplify demographic biases present in their training data, leading to uneven performance across gender and racial subgroups. We introduce pseudo-balancing, a simple and effective strategy for mitigating such biases in semi-supervised learning. Our method enforces demographic balance during pseudo-label selection, using only unlabeled images from a race-balanced dataset without requiring access to ground-truth annotations. We evaluate pseudo-balancing under two conditions: (1) fine-tuning a biased gender classifier using unlabeled images from the FairFace dataset, and (2) stress-testing the method with intentionally imbalanced training data to simulate controlled bias scenarios. In both cases, models are evaluated on the All-Age-Faces (AAF) benchmark, which contains a predominantly East Asian population. Our results show that pseudo-balancing consistently improves fairness while preserving or enhancing accuracy. The method achieves 79.81% overall accuracy - a 6.53% improvement over the baseline - and reduces the gender accuracy gap by 44.17%. In the East Asian subgroup, where baseline disparities exceeded 49%, the gap is narrowed to just 5.01%. These findings suggest that even in the absence of label supervision, access to a demographically balanced or moderately skewed unlabeled dataset can serve as a powerful resource for debiasing existing computer vision models.

Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification

TL;DR

This paper tackles bias in face gender classification arising from unbalanced training data and labels a demographic attribute-free setting. It introduces pseudo-balancing, a lightweight strategy that enforces demographic parity during pseudo-label selection in semi-supervised learning, using unlabeled data from a race-balanced source like FairFace. Across two scenarios, the method improves overall accuracy and substantially reduces gender disparities on the All-Age-Faces benchmark, notably narrowing East Asian subgroup gaps, while avoiding explicit demographic annotations. The work demonstrates the practicality of leveraging balanced unlabeled data to debias computer vision models and outlines limitations under severe data skew, offering directions for future enhancements and broader application.

Abstract

Face gender classification models often reflect and amplify demographic biases present in their training data, leading to uneven performance across gender and racial subgroups. We introduce pseudo-balancing, a simple and effective strategy for mitigating such biases in semi-supervised learning. Our method enforces demographic balance during pseudo-label selection, using only unlabeled images from a race-balanced dataset without requiring access to ground-truth annotations. We evaluate pseudo-balancing under two conditions: (1) fine-tuning a biased gender classifier using unlabeled images from the FairFace dataset, and (2) stress-testing the method with intentionally imbalanced training data to simulate controlled bias scenarios. In both cases, models are evaluated on the All-Age-Faces (AAF) benchmark, which contains a predominantly East Asian population. Our results show that pseudo-balancing consistently improves fairness while preserving or enhancing accuracy. The method achieves 79.81% overall accuracy - a 6.53% improvement over the baseline - and reduces the gender accuracy gap by 44.17%. In the East Asian subgroup, where baseline disparities exceeded 49%, the gap is narrowed to just 5.01%. These findings suggest that even in the absence of label supervision, access to a demographically balanced or moderately skewed unlabeled dataset can serve as a powerful resource for debiasing existing computer vision models.

Paper Structure

This paper contains 18 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: FairFace (FF) and All-Ages-Face (AAF) dataset examples.
  • Figure 2: Iterative self-training using pseudo-labeled data.
  • Figure 3: Scenario 1. Accuracy vs. selection rate for FF variants using FixMatch. Squares denote pseudo-balanced (PB) results and circles non-PB counterparts. Arrows indicate performance shifts. The best model achieves 82.4% accuracy at 93.8% selection rate (highlighted).
  • Figure 4: Scenario 2. Bias-specific performance analysis showing East Asian Female, Black and Male subsets using FixMatch. Squares denote pseudo-balanced results and circles show non-PB results ($\epsilon=0.6$: blue arrows; $\epsilon=0.9$: orange arrows). The model trained on the East Asian Female subset achieves 81.3% accuracy at 84.8% selection rate (highlighted).
  • Figure 5: Scenarios 1 and 2. FlexMatch adaptation results across all bias conditions. Squares denote PB results and circles non-PB results. The best model achieves 83.36% accuracy at 86.03% selection rate using FairFace dataset with PB.
  • ...and 3 more figures