Table of Contents
Fetching ...

Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness

Mojtaba Kolahdouzi, Hatice Gunes, Ali Etemad

TL;DR

This work investigates how the choice of optimization algorithm affects group fairness in deep learning. By modeling optimization dynamics with stochastic differential equations, it shows that adaptive methods like RMSProp and Adam tend to converge to fairer minima than SGD, especially under severe data imbalance. The authors prove two theorems linking adaptive updates to reduced subgroup disparities and bounded one-step fairness gaps, and they validate these insights across CelebA, FairFace, and MS-COCO on tasks including facial expression recognition, gender classification, and multi-label classification. Across multiple backbones and fairness metrics (equalized odds, equal opportunity, demographic parity), adaptive optimizers achieve better fairness with comparable predictive accuracy, highlighting the practical impact of optimizer choice on fairness.

Abstract

We study whether and how the choice of optimization algorithm can impact group fairness in deep neural networks. Through stochastic differential equation analysis of optimization dynamics in an analytically tractable setup, we demonstrate that the choice of optimization algorithm indeed influences fairness outcomes, particularly under severe imbalance. Furthermore, we show that when comparing two categories of optimizers, adaptive methods and stochastic methods, RMSProp (from the adaptive category) has a higher likelihood of converging to fairer minima than SGD (from the stochastic category). Building on this insight, we derive two new theoretical guarantees showing that, under appropriate conditions, RMSProp exhibits fairer parameter updates and improved fairness in a single optimization step compared to SGD. We then validate these findings through extensive experiments on three publicly available datasets, namely CelebA, FairFace, and MS-COCO, across different tasks as facial expression recognition, gender classification, and multi-label classification, using various backbones. Considering multiple fairness definitions including equalized odds, equal opportunity, and demographic parity, adaptive optimizers like RMSProp and Adam consistently outperform SGD in terms of group fairness, while maintaining comparable predictive accuracy. Our results highlight the role of adaptive updates as a crucial yet overlooked mechanism for promoting fair outcomes. We release the source code at: https://github.com/Mkolahdoozi/Some-Optimizers-Are-More-Equal.

Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness

TL;DR

This work investigates how the choice of optimization algorithm affects group fairness in deep learning. By modeling optimization dynamics with stochastic differential equations, it shows that adaptive methods like RMSProp and Adam tend to converge to fairer minima than SGD, especially under severe data imbalance. The authors prove two theorems linking adaptive updates to reduced subgroup disparities and bounded one-step fairness gaps, and they validate these insights across CelebA, FairFace, and MS-COCO on tasks including facial expression recognition, gender classification, and multi-label classification. Across multiple backbones and fairness metrics (equalized odds, equal opportunity, demographic parity), adaptive optimizers achieve better fairness with comparable predictive accuracy, highlighting the practical impact of optimizer choice on fairness.

Abstract

We study whether and how the choice of optimization algorithm can impact group fairness in deep neural networks. Through stochastic differential equation analysis of optimization dynamics in an analytically tractable setup, we demonstrate that the choice of optimization algorithm indeed influences fairness outcomes, particularly under severe imbalance. Furthermore, we show that when comparing two categories of optimizers, adaptive methods and stochastic methods, RMSProp (from the adaptive category) has a higher likelihood of converging to fairer minima than SGD (from the stochastic category). Building on this insight, we derive two new theoretical guarantees showing that, under appropriate conditions, RMSProp exhibits fairer parameter updates and improved fairness in a single optimization step compared to SGD. We then validate these findings through extensive experiments on three publicly available datasets, namely CelebA, FairFace, and MS-COCO, across different tasks as facial expression recognition, gender classification, and multi-label classification, using various backbones. Considering multiple fairness definitions including equalized odds, equal opportunity, and demographic parity, adaptive optimizers like RMSProp and Adam consistently outperform SGD in terms of group fairness, while maintaining comparable predictive accuracy. Our results highlight the role of adaptive updates as a crucial yet overlooked mechanism for promoting fair outcomes. We release the source code at: https://github.com/Mkolahdoozi/Some-Optimizers-Are-More-Equal.

Paper Structure

This paper contains 23 sections, 6 theorems, 51 equations, 13 figures, 7 tables.

Key Result

Lemma 1

Let $\mathcal{L}_0(w) = \frac{1}{2} (w - 1)^2$ and $\mathcal{L}_1(w) = \frac{1}{2} (w + 1)^2$ be the loss functions for subgroups 0 and 1, respectively. Define the population loss as $\mathcal{L}_{\text{pop}}(w) = 0.5 \mathcal{L}_0(w) + 0.5 \mathcal{L}_1(w).$ Under the demographic parity definition

Figures (13)

  • Figure 1: Percentage of 1000 runs converging within the fair neighborhood for SGD and RMSProp, under severe bias ($p_0=0.1$) and mild bias ($p_0=0.3$).
  • Figure 2: Fairness for ViT across different datasets, attributes (G: gender, A: age, R: race), and metrics.
  • Figure 3: Difference of RMSProp and SGD's fairness in different male-to-all ratios for CelebA dataset with gender as sensitive attribute.
  • Figure 4: Convergence rates of RMSProp ($\eta = 0.01$) and SGD ($\eta = 0.1$) to the fair neighbourhood, defined by threshold of $0.2$.
  • Figure 5: Convergence rates of RMSProp ($\eta = 0.1$) and SGD ($\eta = 0.2$) to the fair neighbourhood, defined by threshold of $0.2$.
  • ...and 8 more figures

Theorems & Definitions (13)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • ...and 3 more