Table of Contents
Fetching ...

On the Cause of Unfairness: A Training Sample Perspective

Yuanshun Yao, Yang Liu

TL;DR

This framework can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

Abstract

Identifying the causes of a model's unfairness is an important yet relatively unexplored task. We look into this problem through the lens of training data - the major source of unfairness. We ask the following questions: How would the unfairness of a model change if its training samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) whose features were modified? In other words, we quantify the influence of training samples on unfairness by counterfactually changing samples based on predefined concepts, i.e. data attributes such as features, labels, and sensitive attributes. Our framework not only can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

On the Cause of Unfairness: A Training Sample Perspective

TL;DR

This framework can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

Abstract

Identifying the causes of a model's unfairness is an important yet relatively unexplored task. We look into this problem through the lens of training data - the major source of unfairness. We ask the following questions: How would the unfairness of a model change if its training samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) whose features were modified? In other words, we quantify the influence of training samples on unfairness by counterfactually changing samples based on predefined concepts, i.e. data attributes such as features, labels, and sensitive attributes. Our framework not only can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.
Paper Structure (30 sections, 5 theorems, 40 equations, 16 figures, 3 tables)

This paper contains 30 sections, 5 theorems, 40 equations, 16 figures, 3 tables.

Key Result

Proposition 1

The first-order approximation of $\text{infl}(D_{val}, \mathcal{K}, \hat{\theta})$ takes the following form: where $H_{\hat{\theta}}$ is the hessian matrix i.e. $H_{\hat{\theta}}:=\frac{1}{n}\nabla^2 \sum_{i=1}^n \ell(z^{tr}_i; \hat{\theta})$, and $\ell$ is the original loss function (e.g. cross-entropy loss in classification).

Figures (16)

  • Figure 1: Our data dependency assumption. Yellow arrows represent the data dependency link and red arrows represent the effect of overriding (i.e. counterfactually changing the value of a concept). In training data, the concept variable $C$ can override sensitive attribute $A$ (i.e. Figure (a)), features $X$ (i.e. Figure (b)), and label $Y$ (i.e. Figure (c)). We train the model $\theta$ on $X$ and $Y$, and compute the validation fairness metric Fair on the validation dataset $D_{val}$.
  • Figure 2: CIF-based mitigation performance with fairness measure Demographic Parity (DP).
  • Figure 3: CIF-based mitigation performance with fairness measure Equality of Opportunity (EOP).
  • Figure 4: CIF-based mitigation performance with fairness measure Equality of Odds (EO).
  • Figure 5: Fairness-accuracy tradeoff of CIF-based mitigation on COMPAS. CIF-based mitigation is comparable to in-processing mitigation method, and sometimes achieves better fairness given a similar level of accuracy.
  • ...and 11 more figures

Theorems & Definitions (8)

  • Definition 1
  • Proposition 1
  • Definition 2: Concept Influence for Fairness (CIF)
  • Proposition 2
  • Theorem 1: feldman2020does
  • Proposition 3
  • Proposition 4
  • proof