Table of Contents
Fetching ...

Demographic parity in regression and classification within the unawareness framework

Vincent Divol, Solenne Gaucher

TL;DR

It is demonstrated that nestedness of the decision sets of the classifiers is both necessary and sufficient to establish a form of equivalence between classification and regression, and the connection between optimal fair cost-sensitive classification, and optimal fair regression.

Abstract

This paper explores the theoretical foundations of fair regression under the constraint of demographic parity within the unawareness framework, where disparate treatment is prohibited, extending existing results where such treatment is permitted. Specifically, we aim to characterize the optimal fair regression function when minimizing the quadratic loss. Our results reveal that this function is given by the solution to a barycenter problem with optimal transport costs. Additionally, we study the connection between optimal fair cost-sensitive classification, and optimal fair regression. We demonstrate that nestedness of the decision sets of the classifiers is both necessary and sufficient to establish a form of equivalence between classification and regression. Under this nestedness assumption, the optimal classifiers can be derived by applying thresholds to the optimal fair regression function; conversely, the optimal fair regression function is characterized by the family of cost-sensitive classifiers.

Demographic parity in regression and classification within the unawareness framework

TL;DR

It is demonstrated that nestedness of the decision sets of the classifiers is both necessary and sufficient to establish a form of equivalence between classification and regression, and the connection between optimal fair cost-sensitive classification, and optimal fair regression.

Abstract

This paper explores the theoretical foundations of fair regression under the constraint of demographic parity within the unawareness framework, where disparate treatment is prohibited, extending existing results where such treatment is permitted. Specifically, we aim to characterize the optimal fair regression function when minimizing the quadratic loss. Our results reveal that this function is given by the solution to a barycenter problem with optimal transport costs. Additionally, we study the connection between optimal fair cost-sensitive classification, and optimal fair regression. We demonstrate that nestedness of the decision sets of the classifiers is both necessary and sufficient to establish a form of equivalence between classification and regression. Under this nestedness assumption, the optimal classifiers can be derived by applying thresholds to the optimal fair regression function; conversely, the optimal fair regression function is characterized by the family of cost-sensitive classifiers.
Paper Structure (29 sections, 24 theorems, 129 equations, 2 figures)

This paper contains 29 sections, 24 theorems, 129 equations, 2 figures.

Key Result

Theorem 1

Assume that for all $s \in \mathcal{S}$, the distribution $\nu_{s}$ of $\eta(X,S)$ for $S=s$ has no atoms, and let $p_s = \mathbb{P}(S = s)$. Then, where $\mathcal{W}_2^2(\nu_s, \nu)$ is the squared Wasserstein distance between $\nu_s$ and $\nu$. Moreover, if $f^*$ and $\nu$ solve the left-hand side and the right-hand side problems respectively, then $\nu$ is equal to the distribution of $f^*(X,S

Figures (2)

  • Figure 1: The measure $\boldsymbol{\mu_+}$ is displayed in red and the measure $\boldsymbol{\mu_-}$ is displayed in blue. By definition of $\kappa^+(y)$ and $\kappa^-(y)$, the red region and the blue region to the right of the two dotted lines have equal masses. The region in between the two lines contains no mass.
  • Figure 2: Left: example of a nested problem. The distributions of $\boldsymbol{\mu_+}$ and $\boldsymbol{\mu_-}$ are depicted in red and blue, corresponding to the distributions given in Example \ref{['ex:1']}. The acceptance region for $g_y^{\kappa(y)}$ and $g_{y'}^{\kappa'(y)}$ are so that the masses of $\boldsymbol{\mu_+}$ and $\boldsymbol{\mu_-}$ to the right of the decision boundaries are equal. One can observe that these two regions are nested. Right: example of a non-nested problem. The distributions $\boldsymbol{\mu_+}$ and $\boldsymbol{\mu_-}$ are the ones described in Example \ref{['ex:2']}. The region in pink is rejected for $y=-3$ but accepted for $y=0$, contradicting the nestedness assumption.

Theorems & Definitions (46)

  • Definition 1: Demographic parity
  • Definition 2: Fair regression
  • Definition 3: Fair classification
  • Theorem 1: Chzhen2020AMFgouic2020projection
  • Theorem 2: Informal
  • Theorem 3: Informal
  • Definition 4: Order preservation in regression - unawareness framework
  • Proposition 1
  • proof
  • Lemma 1
  • ...and 36 more