Table of Contents
Fetching ...

Learning Representational Disparities

Pavan Ravishankar, Rushabh Shah, Daniel B. Neill

TL;DR

This work tackles downstream disparities arising from biased human decisions in decision-support systems by learning representational disparities that differentiate the inputs seen by observed and desired decision-makers, aiming to minimize $A = |\Pr(Y=1|S=1) - \Pr(Y=1|S=0)|$. It introduces Learning Representational Disparities (LRD), a shallow, interpretable neural network with representational disparity nodes that capture input differences and a multi-objective loss $L = aA + bB + cC + dD$ with $c \gg a,b$ and $d \gg a,b$ to emphasize outcome modeling and decision fidelity. Theoretical guarantees (Theorems 4.1–4.3) show interpretable, often globally optimal disparity weights that fully mitigate disparity under simplifying assumptions, with extensions to more complex inputs. Empirically, LRD outperforms LFR on German Credit, Adult, and Heritage Health datasets in reducing downstream disparity while preserving or improving accuracy, demonstrating practical potential for nudges to steer human decisions toward fairer outcomes.

Abstract

We propose a fair machine learning algorithm to model interpretable differences between observed and desired human decision-making, with the latter aimed at reducing disparity in a downstream outcome impacted by the human decision. Prior work learns fair representations without considering the outcome in the decision-making process. We model the outcome disparities as arising due to the different representations of the input seen by the observed and desired decision-maker, which we term representational disparities. Our goal is to learn interpretable representational disparities which could potentially be corrected by specific nudges to the human decision, mitigating disparities in the downstream outcome; we frame this as a multi-objective optimization problem using a neural network. Under reasonable simplifying assumptions, we prove that our neural network model of the representational disparity learns interpretable weights that fully mitigate the outcome disparity. We validate objectives and interpret results using real-world German Credit, Adult, and Heritage Health datasets.

Learning Representational Disparities

TL;DR

This work tackles downstream disparities arising from biased human decisions in decision-support systems by learning representational disparities that differentiate the inputs seen by observed and desired decision-makers, aiming to minimize . It introduces Learning Representational Disparities (LRD), a shallow, interpretable neural network with representational disparity nodes that capture input differences and a multi-objective loss with and to emphasize outcome modeling and decision fidelity. Theoretical guarantees (Theorems 4.1–4.3) show interpretable, often globally optimal disparity weights that fully mitigate disparity under simplifying assumptions, with extensions to more complex inputs. Empirically, LRD outperforms LFR on German Credit, Adult, and Heritage Health datasets in reducing downstream disparity while preserving or improving accuracy, demonstrating practical potential for nudges to steer human decisions toward fairer outcomes.

Abstract

We propose a fair machine learning algorithm to model interpretable differences between observed and desired human decision-making, with the latter aimed at reducing disparity in a downstream outcome impacted by the human decision. Prior work learns fair representations without considering the outcome in the decision-making process. We model the outcome disparities as arising due to the different representations of the input seen by the observed and desired decision-maker, which we term representational disparities. Our goal is to learn interpretable representational disparities which could potentially be corrected by specific nudges to the human decision, mitigating disparities in the downstream outcome; we frame this as a multi-objective optimization problem using a neural network. Under reasonable simplifying assumptions, we prove that our neural network model of the representational disparity learns interpretable weights that fully mitigate the outcome disparity. We validate objectives and interpret results using real-world German Credit, Adult, and Heritage Health datasets.

Paper Structure

This paper contains 12 sections, 1 theorem, 60 equations, 6 figures, 6 tables.

Key Result

Theorem 4.1

Assume the data generating process and neural network architecture in Figures datagengraph-neural_network and assumptions (A1)-(A6) above. Here the decision $H$ depends only on $S$, and the outcome $Y$ depends only on $H$. Let $\alpha = \text{Pr}(Y=1\:|\: H=1)-\hbox{Pr}(Y=1\:|\:H=0)$, $\alpha \ne 0$ Here, $R'$ is the representational disparity node; $O(s) = \text{Pr}_\text{obs}(H=1\:|\: S=s)$, $RD

Figures (6)

  • Figure 1: Data Generation Process
  • Figure 2: Architecture (left) with nodes used by the observed (middle) and desired human (right)
  • Figure 3: Regions divided based on the signs of $w$, $w_{SR'}$, and $\text{bias}_{R'}$.
  • Figure 4: Components of the multi-objective loss function as a function of $a$, the relative weight of the disparity in fairness loss as compared to the regularization loss. Note $b=1-a$, $c\gg a$, and $d\gg a$ for all experiments. (a) Loss $A$ vs $a$; (b) Loss $B$ vs $a$; (c) Loss $C$ vs $a$; (d) Loss $D$ vs $a$. Note the small scale of the $y$-axis in (c) and (d); we see that $C\approx C_{opt}$ and $D\approx D_{opt}$ for all values of $a$.
  • Figure 5: Losses $L1$ and $L2$ (top), Case 1 (bottom left), 2 (bottom middle), and 3 (bottom right). Loss $L1$ is shown in red, and Loss $L2$ is shown in blue. The $x$-axis is the regularization loss $B_\mathbf{w}=|w_3|+|w_{SR_3}|+|\text{bias}_{R_3}|$ for representational disparity node $R_3$, with corresponding shift in logits $B_\mathbf{w}^2/4$. The $y$-axis is total loss $aA_\mathbf{w} + bB_\mathbf{w}$ for $a=0.9$ and $b=0.1$.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 4.1