Table of Contents
Fetching ...

Alpha and Prejudice: Improving $α$-sized Worst-case Fairness via Intrinsic Reweighting

Jing Li, Yinghua Yao, Yuangang Pan, Xuanqian Wang, Ivor W. Tsang, Xiuju Fu

TL;DR

A reweighting approach that assigns sample weights based on their intrinsic contributions to fairness based on their intrinsic contributions to fairness is proposed, and a stochastic learning algorithm is developed that simplifies training without sacrificing performance.

Abstract

Worst-case fairness with off-the-shelf demographics achieves group parity by maximizing the model utility of the worst-off group. Nevertheless, demographic information is often unavailable in practical scenarios, which impedes the use of such a direct max-min formulation. Recent advances have reframed this learning problem by introducing the lower bound of minimal partition ratio, denoted as $α$, as side information, referred to as ``$α$-sized worst-case fairness'' in this paper. We first justify the practical significance of this setting by presenting noteworthy evidence from the data privacy perspective, which has been overlooked by existing research. Without imposing specific requirements on loss functions, we propose reweighting the training samples based on their intrinsic importance to fairness. Given the global nature of the worst-case formulation, we further develop a stochastic learning scheme to simplify the training process without compromising model performance. Additionally, we address the issue of outliers and provide a robust variant to handle potential outliers during model training. Our theoretical analysis and experimental observations reveal the connections between the proposed approaches and existing ``fairness-through-reweighting'' studies, with extensive experimental results on fairness benchmarks demonstrating the superiority of our methods.

Alpha and Prejudice: Improving $α$-sized Worst-case Fairness via Intrinsic Reweighting

TL;DR

A reweighting approach that assigns sample weights based on their intrinsic contributions to fairness based on their intrinsic contributions to fairness is proposed, and a stochastic learning algorithm is developed that simplifies training without sacrificing performance.

Abstract

Worst-case fairness with off-the-shelf demographics achieves group parity by maximizing the model utility of the worst-off group. Nevertheless, demographic information is often unavailable in practical scenarios, which impedes the use of such a direct max-min formulation. Recent advances have reframed this learning problem by introducing the lower bound of minimal partition ratio, denoted as , as side information, referred to as ``-sized worst-case fairness'' in this paper. We first justify the practical significance of this setting by presenting noteworthy evidence from the data privacy perspective, which has been overlooked by existing research. Without imposing specific requirements on loss functions, we propose reweighting the training samples based on their intrinsic importance to fairness. Given the global nature of the worst-case formulation, we further develop a stochastic learning scheme to simplify the training process without compromising model performance. Additionally, we address the issue of outliers and provide a robust variant to handle potential outliers during model training. Our theoretical analysis and experimental observations reveal the connections between the proposed approaches and existing ``fairness-through-reweighting'' studies, with extensive experimental results on fairness benchmarks demonstrating the superiority of our methods.

Paper Structure

This paper contains 21 sections, 3 theorems, 20 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Given a bound $\alpha$ for the minimal group proportion, we have $\mathcal{J}_{wg}(\theta) \le \mathcal{J}_{N\alpha}(\theta;\alpha) \le \mathcal{J}_{dr}(\theta;\alpha)$ for all $\theta \in \Theta$.

Figures (6)

  • Figure 1: Comparison of four weights learning strategies. The weight of our IRW method is not solely determined by loss; therefore we use discrete scatter plots to represent the variance introduced by other factors, i.e., gradients.
  • Figure 2: Weight distribution of training samples for our IRW. For each dataset, we record the weights of training samples at the first training epoch and use kernel density estimation to plot their distribution. The actual worst indicates the group which has the lowest accuracy on test set, and all rest are included in the remaining.
  • Figure 3: The proportion of actual worst-off group among top $m\% (m=10,20,...,90)$ training samples selected by reassigned weights. We compare BPF and our IRW method on four datasets.
  • Figure 4: Per-sample weight versus their loss values on a randomly selected batch at the end of training process on four datasets. The number of $\alpha$-sized worst samples (red stars) is $\text{int}(b\times\alpha)$, where $b=128$ for (a)-(c) and $b=256$ for (d), and $\alpha$ is indicated by the head row of Table \ref{['tbl:specific_attributes']}. The non-worst positive-weighted samples (green crosses) and non-worst zero-weighted samples (blue points) are the ones whose gradients yield positive or zero $w_i$ at the iteration $t$, respectively. (Refer to Eq. \ref{['eq:per_sample_weight']}).
  • Figure 5: Classification errors comparison on the UCI Adult dataset. The y-axis is truncated above 3 for better visualization.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Definition 1
  • Theorem 1
  • Theorem 2
  • proof
  • proof
  • proof