Long-Term Fairness with Unknown Dynamics

Tongxin Yin; Reilly Raab; Mingyan Liu; Yang Liu

Long-Term Fairness with Unknown Dynamics

Tongxin Yin, Reilly Raab, Mingyan Liu, Yang Liu

TL;DR

This paper formalizes long-term fairness in the context of online reinforcement learning to accommodate dynamical control objectives, such as driving equity inherent in the state of a population, that cannot be incorporated into static formulations of fairness.

Abstract

While machine learning can myopically reinforce social inequalities, it may also be used to dynamically seek equitable outcomes. In this paper, we formalize long-term fairness in the context of online reinforcement learning. This formulation can accommodate dynamical control objectives, such as driving equity inherent in the state of a population, that cannot be incorporated into static formulations of fairness. We demonstrate that this framing allows an algorithm to adapt to unknown dynamics by sacrificing short-term incentives to drive a classifier-population system towards more desirable equilibria. For the proposed setting, we develop an algorithm that adapts recent work in online learning. We prove that this algorithm achieves simultaneous probabilistic bounds on cumulative loss and cumulative violations of fairness (as statistical regularities between demographic groups). We compare our proposed algorithm to the repeated retraining of myopic classifiers, as a baseline, and to a deep reinforcement learning algorithm that lacks safety guarantees. Our experiments model human populations according to evolutionary game theory and integrate real-world datasets.

Long-Term Fairness with Unknown Dynamics

TL;DR

Abstract

Paper Structure (42 sections, 19 theorems, 58 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 42 sections, 19 theorems, 58 equations, 15 figures, 1 table, 1 algorithm.

Introduction
Related Work
Problem Formulation
State, Action, and Policy
Dynamics
Reward and Utility
Value and Quality Functions
"Long-term fairness" via reinforcement learning
The Online Setting
Algorithms and Analysis
L-UCBFair
Episodic MDP
Episodic Regret
The Lagrangian
Explicit Construction
...and 27 more sections

Key Result

Lemma 3.0

For $\Bar{\pi}$ and $\gamma > 0$ satisfying Slater's Condition (asm:slaters-condition),

Figures (15)

Figure 1: The greedy baseline algorithm (left) and L-UCBFair (right) are tasked to maximize the fraction of true-positive classifications ($\mathscr{L}=1-\texttt{tp}\xspace$, \ref{['eq:loss_function']}), subject to demographic parity ($\mathscr{D}{=}\texttt{DP}\xspace$, \ref{['eq:DP-2']}). The greedy algorithm uses $\lambda{=}0.5$ in \ref{['eq:greedy']}, while L-UCBFair is trained for 2,000 steps on episodes of length 100 prior to generating this "phase portrait". We depict the expected dynamics (averaged over 20 policy iterations for each state) of the classifier-population system, parameterized by the time-evolving qualification rate in each group (1 on the horizontal, 2 on the vertical). Each group is of equal size and identically modeled by the standard normal $X \sim \mathcal{N}(Y, 1)$. Note that states in the left plot attract to universal non-qualification $\Pr(Y{=}1){=}0$, while the right plot converges to universal qualification. The lower plot shows average loss over pairs of randomly sampled episodes.
Figure 2: Using a modelled population initialized with the "Adult" dataset, reweighted for equal group representation (\ref{['sec:synthesis']}), L-UCBFair (left) and R-TD3 (right) are tasked, as in \ref{['fig:1-inline']}, to maximize the fraction of true-positive classifications ($\mathscr{L}=1-\texttt{tp}\xspace$, \ref{['eq:loss_function']}), subject to demographic parity ($\mathscr{D}{=}\texttt{DP}\xspace$, \ref{['eq:DP-2']}). L-UCBFair performs almost indistinguishably from the experiment on the synthetic dataset (\ref{['fig:1-inline']}), while R-TD3 learns qualitatively similar behavior with more aggressive short-term violations of the fairness constraint.
Figure 3: L-UCBFair 20-step sliding mean & std training loss (left) and disparity (right) for the \ref{['fig:1-inline']} setting.
Figure 4: Phase portraits for L-UCBFair (left), and R-TD3 (right) interacting on the synthetic distribution $X \sim \mathcal{N}(Y, 1)$ with groups of equal size. Both algorithms use $\mathscr{L}=1-\texttt{tp}\xspace-\texttt{tn}\xspace$ (i.e., zero-one loss) and $\mathscr{D}=\texttt{QR}\xspace$. Shading: qualification rate disparity for the next time-step.
Figure 5: The interaction of an algorithmic classifier and a reactive population. Given state $s_{\tau}$, the classifier uses policy $\pi$ to select action $a_{\tau}$. The population, in state $s_{\tau}$, reacts to $a_{\tau}$ , transitioning state to $s_{\tau + 1}$, then the process repeats.
...and 10 more figures

Theorems & Definitions (20)

Lemma 3.0: Boundedness of $\nu^{*}$
Definition 3.1
Lemma 3.2
Theorem 3.3
Theorem 3.5: Boundedness
Theorem 4.1: Threshold Bayes-optimality
Theorem A.1: Boundedness
Lemma A.0: Boundedness of (T_1)
Lemma A.0: Boundedness of (T_2)
Lemma A.0
...and 10 more

Long-Term Fairness with Unknown Dynamics

TL;DR

Abstract

Long-Term Fairness with Unknown Dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (20)