Table of Contents
Fetching ...

Fair Class-Incremental Learning using Sample Weighting

Jaeyoung Park, Minsu Kim, Steven Euijong Whang

TL;DR

This work tackles fairness in class-incremental learning by showing that unfair forgetting can occur when the current-task gradient direction opposes sensitive-group gradients. It introduces Fairness-aware Sample Weighting (FSW), which reweights current-task samples to steer the average gradient toward fairness while preserving accuracy. The method formulates LPs for three group-fairness notions—Equal Error Rate (EER), Equalized Odds (EO), and Demographic Parity (DP)—and solves them efficiently using last-layer gradients. Empirical results across image, text, and tabular datasets demonstrate that FSW achieves better accuracy-fairness tradeoffs than state-of-the-art baselines, and can further enhance fairness when combined with post-processing techniques. The work advances practical fair continual learning by simultaneously addressing multiple fairness notions and offering a scalable, optimization-based solution.

Abstract

Model fairness is becoming important in class-incremental learning for Trustworthy AI. While accuracy has been a central focus in class-incremental learning, fairness has been relatively understudied. However, naively using all the samples of the current task for training results in unfair catastrophic forgetting for certain sensitive groups including classes. We theoretically analyze that forgetting occurs if the average gradient vector of the current task data is in an "opposite direction" compared to the average gradient vector of a sensitive group, which means their inner products are negative. We then propose a fair class-incremental learning framework that adjusts the training weights of current task samples to change the direction of the average gradient vector and thus reduce the forgetting of underperforming groups and achieve fairness. For various group fairness measures, we formulate optimization problems to minimize the overall losses of sensitive groups while minimizing the disparities among them. We also show the problems can be solved with linear programming and propose an efficient Fairness-aware Sample Weighting (FSW) algorithm. Experiments show that FSW achieves better accuracy-fairness tradeoff results than state-of-the-art approaches on real datasets.

Fair Class-Incremental Learning using Sample Weighting

TL;DR

This work tackles fairness in class-incremental learning by showing that unfair forgetting can occur when the current-task gradient direction opposes sensitive-group gradients. It introduces Fairness-aware Sample Weighting (FSW), which reweights current-task samples to steer the average gradient toward fairness while preserving accuracy. The method formulates LPs for three group-fairness notions—Equal Error Rate (EER), Equalized Odds (EO), and Demographic Parity (DP)—and solves them efficiently using last-layer gradients. Empirical results across image, text, and tabular datasets demonstrate that FSW achieves better accuracy-fairness tradeoffs than state-of-the-art baselines, and can further enhance fairness when combined with post-processing techniques. The work advances practical fair continual learning by simultaneously addressing multiple fairness notions and offering a scalable, optimization-based solution.

Abstract

Model fairness is becoming important in class-incremental learning for Trustworthy AI. While accuracy has been a central focus in class-incremental learning, fairness has been relatively understudied. However, naively using all the samples of the current task for training results in unfair catastrophic forgetting for certain sensitive groups including classes. We theoretically analyze that forgetting occurs if the average gradient vector of the current task data is in an "opposite direction" compared to the average gradient vector of a sensitive group, which means their inner products are negative. We then propose a fair class-incremental learning framework that adjusts the training weights of current task samples to change the direction of the average gradient vector and thus reduce the forgetting of underperforming groups and achieve fairness. For various group fairness measures, we formulate optimization problems to minimize the overall losses of sensitive groups while minimizing the disparities among them. We also show the problems can be solved with linear programming and propose an efficient Fairness-aware Sample Weighting (FSW) algorithm. Experiments show that FSW achieves better accuracy-fairness tradeoff results than state-of-the-art approaches on real datasets.
Paper Structure (68 sections, 10 theorems, 8 equations, 30 figures, 28 tables, 2 algorithms)

This paper contains 68 sections, 10 theorems, 8 equations, 30 figures, 28 tables, 2 algorithms.

Key Result

lemma 1

Denote $G$ as a sensitive group containing features $X$ and true labels $y$. Also, denote $f_{\theta}^{l-1}$ as a previous model and $f_{\theta}$ as the updated model after training on the current task $T_l$. Let $\ell$ be any differentiable loss function (e.g., cross-entropy loss), and $\eta$ be a where $\tilde{\ell}(f_{\theta}, G)$ is the approximated average loss between model predictions $f_{

Figures (30)

  • Figure 1: (a) A synthetic dataset for class-incremental learning. (b) Training on Class 2 results in unfair forgetting on Class 1 only. (c) The average gradient vector of Class 2, $g_2$, is more than $90^{\circ}$ apart from Class 1's $g_1$, which means the model is being trained in an opposite direction. Our method adjusts $g_2$ to $g_2^*$ through sample weighting to be closer to $g_1$, but not too far from the original $g_2$. (d) As a result, the unfair forgetting is mitigated while minimally sacrificing accuracy for Class 2.
  • Figure 2: Tradeoff results between accuracy and fairness on the MNIST and Biased MNIST datasets. FSW positioned in the lower right corner of the graph, indicating better accuracy-fairness tradeoff results compared to other baselines.
  • Figure 3: Distribution of sample weights for EO in sequential tasks of the Biased MNIST dataset.
  • Figure 4: Comparison of EO disparity and cost function for EO during training on the Biased MNIST dataset. We train a model for 15 epochs per task.
  • Figure 5: t-SNE results for the MNIST, FMNIST, Biased MNIST, and DRUG datasets.
  • ...and 25 more figures

Theorems & Definitions (10)

  • lemma 1
  • proposition 1
  • theorem 1
  • theorem 2
  • lemma 2: Restated from Lemma \ref{['lem:CF']}
  • theorem 3: Restated from Theorem \ref{['thm:CFcondition']}
  • theorem 4
  • proposition 2
  • lemma 3
  • theorem 5: Restated from Theorem \ref{['thm:OptimToLP']}