Learning Representational Disparities
Pavan Ravishankar, Rushabh Shah, Daniel B. Neill
TL;DR
This work tackles downstream disparities arising from biased human decisions in decision-support systems by learning representational disparities that differentiate the inputs seen by observed and desired decision-makers, aiming to minimize $A = |\Pr(Y=1|S=1) - \Pr(Y=1|S=0)|$. It introduces Learning Representational Disparities (LRD), a shallow, interpretable neural network with representational disparity nodes that capture input differences and a multi-objective loss $L = aA + bB + cC + dD$ with $c \gg a,b$ and $d \gg a,b$ to emphasize outcome modeling and decision fidelity. Theoretical guarantees (Theorems 4.1–4.3) show interpretable, often globally optimal disparity weights that fully mitigate disparity under simplifying assumptions, with extensions to more complex inputs. Empirically, LRD outperforms LFR on German Credit, Adult, and Heritage Health datasets in reducing downstream disparity while preserving or improving accuracy, demonstrating practical potential for nudges to steer human decisions toward fairer outcomes.
Abstract
We propose a fair machine learning algorithm to model interpretable differences between observed and desired human decision-making, with the latter aimed at reducing disparity in a downstream outcome impacted by the human decision. Prior work learns fair representations without considering the outcome in the decision-making process. We model the outcome disparities as arising due to the different representations of the input seen by the observed and desired decision-maker, which we term representational disparities. Our goal is to learn interpretable representational disparities which could potentially be corrected by specific nudges to the human decision, mitigating disparities in the downstream outcome; we frame this as a multi-objective optimization problem using a neural network. Under reasonable simplifying assumptions, we prove that our neural network model of the representational disparity learns interpretable weights that fully mitigate the outcome disparity. We validate objectives and interpret results using real-world German Credit, Adult, and Heritage Health datasets.
