On the Cause of Unfairness: A Training Sample Perspective

Yuanshun Yao; Yang Liu

On the Cause of Unfairness: A Training Sample Perspective

Yuanshun Yao, Yang Liu

TL;DR

This framework can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

Abstract

Identifying the causes of a model's unfairness is an important yet relatively unexplored task. We look into this problem through the lens of training data - the major source of unfairness. We ask the following questions: How would the unfairness of a model change if its training samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) whose features were modified? In other words, we quantify the influence of training samples on unfairness by counterfactually changing samples based on predefined concepts, i.e. data attributes such as features, labels, and sensitive attributes. Our framework not only can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

On the Cause of Unfairness: A Training Sample Perspective

TL;DR

Abstract

Paper Structure (30 sections, 5 theorems, 40 equations, 16 figures, 3 tables)

This paper contains 30 sections, 5 theorems, 40 equations, 16 figures, 3 tables.

Introduction
Influence of Training Concepts
Fairness Influence Function
Concepts in Training Data
Concept Influence for Fairness (CIF)
Method
Generating Counterfactual Samples
Computing Influence
Experiments
Mitigation Performance
Additional Applications of CIF
Related Work
Conclusions and Limitations
Proposition \ref{['eq:fairinfl']}: Derivation of Fairness Function on Group Fairness
Proposition \ref{['eq:fairinfl-cs']}: Derivation of Fairness Function for Counterfactual Samples
...and 15 more sections

Key Result

Proposition 1

The first-order approximation of $\text{infl}(D_{val}, \mathcal{K}, \hat{\theta})$ takes the following form: where $H_{\hat{\theta}}$ is the hessian matrix i.e. $H_{\hat{\theta}}:=\frac{1}{n}\nabla^2 \sum_{i=1}^n \ell(z^{tr}_i; \hat{\theta})$, and $\ell$ is the original loss function (e.g. cross-entropy loss in classification).

Figures (16)

Figure 1: Our data dependency assumption. Yellow arrows represent the data dependency link and red arrows represent the effect of overriding (i.e. counterfactually changing the value of a concept). In training data, the concept variable $C$ can override sensitive attribute $A$ (i.e. Figure (a)), features $X$ (i.e. Figure (b)), and label $Y$ (i.e. Figure (c)). We train the model $\theta$ on $X$ and $Y$, and compute the validation fairness metric Fair on the validation dataset $D_{val}$.
Figure 2: CIF-based mitigation performance with fairness measure Demographic Parity (DP).
Figure 3: CIF-based mitigation performance with fairness measure Equality of Opportunity (EOP).
Figure 4: CIF-based mitigation performance with fairness measure Equality of Odds (EO).
Figure 5: Fairness-accuracy tradeoff of CIF-based mitigation on COMPAS. CIF-based mitigation is comparable to in-processing mitigation method, and sometimes achieves better fairness given a similar level of accuracy.
...and 11 more figures

Theorems & Definitions (8)

Definition 1
Proposition 1
Definition 2: Concept Influence for Fairness (CIF)
Proposition 2
Theorem 1: feldman2020does
Proposition 3
Proposition 4
proof

On the Cause of Unfairness: A Training Sample Perspective

TL;DR

Abstract

On the Cause of Unfairness: A Training Sample Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (8)