Table of Contents
Fetching ...

SHaSaM: Submodular Hard Sample Mining for Fair Facial Attribute Recognition

Anay Majee, Rishabh Iyer

TL;DR

SHaSaM tackles fairness in facial attribute recognition by decoupling target predictions from sensitive attributes through a two-stage, submodular optimization framework. It first performs Submodular Hard Sample Mining (SHaSaM-MINE) to select balanced, diverse hard positives and hard negatives, and then applies Submodular Conditional Mutual Information (SHaSaM-LEARN) to learn embeddings that maximize overlap with target classes while minimizing leakage from sensitive attributes. The approach yields state-of-the-art Equalized Odds improvements and accuracy gains on CelebA and UTKFace, often with faster convergence in stage 1 and resilient performance across backbones (ResNet-18 and ViT). By framing fairness as a discrete-continuous optimization problem over sets, SHaSaM provides a principled, scalable mechanism to foster fair representations without compromising downstream task performance.

Abstract

Deep neural networks often inherit social and demographic biases from annotated data during model training, leading to unfair predictions, especially in the presence of sensitive attributes like race, age, gender etc. Existing methods fall prey to the inherent data imbalance between attribute groups and inadvertently emphasize on sensitive attributes, worsening unfairness and performance. To surmount these challenges, we propose SHaSaM (Submodular Hard Sample Mining), a novel combinatorial approach that models fairness-driven representation learning as a submodular hard-sample mining problem. Our two-stage approach comprises of SHaSaM-MINE, which introduces a submodular subset selection strategy to mine hard positives and negatives - effectively mitigating data imbalance, and SHaSaM-LEARN, which introduces a family of combinatorial loss functions based on Submodular Conditional Mutual Information to maximize the decision boundary between target classes while minimizing the influence of sensitive attributes. This unified formulation restricts the model from learning features tied to sensitive attributes, significantly enhancing fairness without sacrificing performance. Experiments on CelebA and UTKFace demonstrate that SHaSaM achieves state-of-the-art results, with up to 2.7 points improvement in model fairness (Equalized Odds) and a 3.5% gain in Accuracy, within fewer epochs as compared to existing methods.

SHaSaM: Submodular Hard Sample Mining for Fair Facial Attribute Recognition

TL;DR

SHaSaM tackles fairness in facial attribute recognition by decoupling target predictions from sensitive attributes through a two-stage, submodular optimization framework. It first performs Submodular Hard Sample Mining (SHaSaM-MINE) to select balanced, diverse hard positives and hard negatives, and then applies Submodular Conditional Mutual Information (SHaSaM-LEARN) to learn embeddings that maximize overlap with target classes while minimizing leakage from sensitive attributes. The approach yields state-of-the-art Equalized Odds improvements and accuracy gains on CelebA and UTKFace, often with faster convergence in stage 1 and resilient performance across backbones (ResNet-18 and ViT). By framing fairness as a discrete-continuous optimization problem over sets, SHaSaM provides a principled, scalable mechanism to foster fair representations without compromising downstream task performance.

Abstract

Deep neural networks often inherit social and demographic biases from annotated data during model training, leading to unfair predictions, especially in the presence of sensitive attributes like race, age, gender etc. Existing methods fall prey to the inherent data imbalance between attribute groups and inadvertently emphasize on sensitive attributes, worsening unfairness and performance. To surmount these challenges, we propose SHaSaM (Submodular Hard Sample Mining), a novel combinatorial approach that models fairness-driven representation learning as a submodular hard-sample mining problem. Our two-stage approach comprises of SHaSaM-MINE, which introduces a submodular subset selection strategy to mine hard positives and negatives - effectively mitigating data imbalance, and SHaSaM-LEARN, which introduces a family of combinatorial loss functions based on Submodular Conditional Mutual Information to maximize the decision boundary between target classes while minimizing the influence of sensitive attributes. This unified formulation restricts the model from learning features tied to sensitive attributes, significantly enhancing fairness without sacrificing performance. Experiments on CelebA and UTKFace demonstrate that SHaSaM achieves state-of-the-art results, with up to 2.7 points improvement in model fairness (Equalized Odds) and a 3.5% gain in Accuracy, within fewer epochs as compared to existing methods.
Paper Structure (36 sections, 29 equations, 10 figures, 7 tables, 2 algorithms)

This paper contains 36 sections, 29 equations, 10 figures, 7 tables, 2 algorithms.

Figures (10)

  • Figure 1: Illustration of SHaSaM - Given a target $T$ (gender) and sensitive attribute $S$ (eyeglasses) set, (a) SHaSaM-MINE selects hard-positives $A_p^1$ and hard-negatives $A_n^1$ given anchor set $A_{11}$. (b) SHaSaM-LEARN enforces a decision boundary between target attributes $T$ invariant to the sensitive attribute $S$.
  • Figure 2: Training Strategy in SHaSaM-LEARN which learns parameters of $F(x, \theta)$ by minimizing a novel combinatorial objective $L_{\textsc{SHaSaM}}$ to learn features invariant to sensitive attributes.
  • Figure 3: Results of SHaSaM on UTKFace dataset measuring (a) Equalized Odds and (b) Top-1 Acc. under varying inter-group imbalance ($\alpha$). The target and sensitive attributes are set to gender and $\textit{ethnicity}$ respectively following setup in fscl.
  • Figure 4: Contrasting Random and SHaSaM-MINE selection strategies on a synthetic two-cluster imbalanced dataset to identify (a) Anchors, (b) Hard Positives and (c) Hard Negatives, showing the effectiveness of SHaSaM in modeling the decision boundary between target attributes. The dataset generation and sample selection in performed under the same seed.
  • Figure 5: The set formulation in SHaSaM learns discriminative representations within fewer training epochs (in stage 1) in CelebA for male and attractiveness as target and sensitive attributes.
  • ...and 5 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof