How to be fair? A study of label and selection bias

Marco Favier; Toon Calders; Sam Pinxteren; Jonathan Meyer

How to be fair? A study of label and selection bias

Marco Favier, Toon Calders, Sam Pinxteren, Jonathan Meyer

TL;DR

This paper establishes relationships between the type of bias and the effectiveness of a mitigation technique, where the mitigation techniques are categorized by the bias measure they optimize, and illustrates this principle for label and selection bias and demographic parity and “The authors’re All Equal”.

Abstract

It is widely accepted that biased data leads to biased and thus potentially unfair models. Therefore, several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques whose aim is to learn models that are fair by design. Despite the myriad of mitigation techniques developed in the past decade, however, it is still poorly understood under what circumstances which methods work. Recently, Wick et al. showed, with experiments on synthetic data, that there exist situations in which bias mitigation techniques lead to more accurate models when measured on unbiased data. Nevertheless, in the absence of a thorough mathematical analysis, it remains unclear which techniques are effective under what circumstances. We propose to address this problem by establishing relationships between the type of bias and the effectiveness of a mitigation technique, where we categorize the mitigation techniques by the bias measure they optimize. In this paper we illustrate this principle for label and selection bias on the one hand, and demographic parity and ``We're All Equal'' on the other hand. Our theoretical analysis allows to explain the results of Wick et al. and we also show that there are situations where minimizing fairness measures does not result in the fairest possible distribution.

How to be fair? A study of label and selection bias

TL;DR

Abstract

Paper Structure (22 sections, 12 theorems, 66 equations, 3 figures)

This paper contains 22 sections, 12 theorems, 66 equations, 3 figures.

Introduction
Related Work
Measuring fairness.
Bias mitigation techniques.
Understanding bias and its relation to fairness.
Notation
Fair World Framework
Statistical Parity
We're All Equal
Bias
Label Bias
Selection Bias
Worldview and Bias Combinations
Statistical Parity and Label Bias
Statistical Parity and Selection Bias
...and 7 more sections

Key Result

Theorem 5.1

Under label bias, the relation between the conditional distributions $P(y_1\mid x, a)$ and $P_D(y_1\mid x, a)$ is linear, that is for some $(\alpha_a,\beta_a, \gamma_a) \in \mathbb{R}^3\smallsetminus\{(0,0,0)\}$, while $P_D(x,a)= P(x,a)$

Figures (3)

Figure 1: The Fair World Framework on which out theory is based. We assume a fair world in which a fairness criterion holds. The observed data, however, is obtained through a biased process. Fairness-aware techniques learn models in the biased data while simultaneously optimizing a fairness measure. This results in a model that is "fair by design."
Figure 2: Results from wick2019unlocking on the fairness-accuracy trade-off.
Figure 3: Experimental results

Theorems & Definitions (26)

Theorem 5.1
proof
Theorem 5.2
proof
Theorem 6.1
proof
Corollary 6.1
Theorem 6.2
proof
Example 6.1
...and 16 more

How to be fair? A study of label and selection bias

TL;DR

Abstract

How to be fair? A study of label and selection bias

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (26)