What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Zhihong Deng; Jing Jiang; Guodong Long; Chengqi Zhang

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang

TL;DR

The paper tackles long-term fairness in reinforcement learning with sensitive attributes by introducing a causal framework that decomposes inequality into environment-driven dynamics and decision-driven or history-driven components. It defines dynamics fairness as the absence of direct causal effects from the sensitive attribute to environmental next-state and reward via an augmented dynamics model, and provides identification formulas to estimate these counterfactual effects from data. A model-based RL algorithm, InsightFair, leverages ensemble dynamics and a fairness-aware planning objective to detect and compensate for environment-induced disparities, balancing fairness with task performance. Through theoretical decomposition results and experiments on Allocation-v0 and Lending-v0, the work demonstrates how to explain, detect, and reduce long-term inequality in RL, offering a principled approach to attributing fairness failures to environmental mechanisms versus decisions. The proposed framework and methods have practical implications for deploying RL in sensitive, dynamic domains where long-run equity is critical.

Abstract

In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair.

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

TL;DR

Abstract

Paper Structure (19 sections, 5 theorems, 20 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 5 theorems, 20 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Analysing inequality in reinforcement learning
Problem Formulation
Decomposition of Inequality
Dynamics Fairness
Algorithm for Achieving Long-Term Fairness
Experiments
Explaining Inequality
Detecting Violation of Dynamics Fairness
Promoting Long-Term Fairness
Related Work
Conclusion
Proofs
Algorithm for Achieving Long-Term Fairness
...and 4 more sections

Key Result

Lemma 1

The well-being gap $\text{TE}_{z_0,z_1}(G_t)$ can be decomposed into the sum of reward gaps at each time step: where $\text{TE}_{z_0,z_1}(R_{t}) \coloneqq \mathbb{E}[R_t(z_1)] - \mathbb{E}[R_t(z_0)]$ is the reward gap at step $t$.

Figures (9)

Figure 1: Dynamics fairness can help us gain a better understanding of the outcomes of different fairness criteria. This figure shows a sequential decision-making problem with two demographic groups: red represents the advantaged group and blue represents the disadvantaged. We present six subfigures, highlighting different fairness criteria (indicated by columns) alongside fair or unfair environmental dynamics (indicated by rows). Each subfigure consists of a line graph showing the evolution of group-wise states over time, accompanied by a histogram illustrating the future returns. The advantaged group may occupy a better initial state (e.g., socio-economic status, qualification profile, expertise level, market competitiveness, etc.), be easier to receive high rewards or reach better states compared to the disadvantaged group. Solid green circles represent consistent decisions for both groups, while empty red circles denote tailored decisions based on sensitive attributes.
Figure 2: A causal diagram for sequential decision-making problems with a sensitive attribute $Z$. For clarity, the edges from the sensitive attribute to states, decisions, and rewards are represented by the connection between purple arrows and shaded areas.
Figure 3: (Left) A local causal diagram in which the directed edge $Z \rightarrow S_t$ carries the direct effect of $Z$ on $S_t$ and the indirect effect mediated by past states and actions (the history). We can combine them into one when focusing on analysing the effect of $Z$ on $R_t$ (or $S_{t+1}$). (Right) A graphical representation of natural direct and indirect effects, where the contrast between the two quantities is highlighted in red and blue.
Figure 4: Visualization of total effect, natural direct effect, and natural indirect effect under different model parameters. The purple curve represents $\text{TE}_{z_0,z_1}(R)$, while the red and blue curves depict $\text{NDE}_{z_0,z_1}(R)$ and $-\text{NIE}_{z_1,z_0}(R)$, respectively. The two figures illustrate scenarios where either the indirect effect dominates (top) or the direct effect dominates (bottom), while the shaded areas show that the sum of $\text{NDE}_{z_0,z_1}(R)$ and $-\text{NIE}_{z_1,z_0}(R)$ matches the total effect in both scenarios, as guaranteed by Theorem \ref{['thm1:Decomposition']}.
Figure 5: Results of evaluating dynamics fairness using the proposed identification formula. Each tile represents the estimated natural direct effects, $\text{NDE}{z_0,z_1}(R)$ or $\text{NDE}{z_0,z_1}(S')$, under specific parameter configurations indicated by the row and column. Lighter colors signify values closer to zero (satisfying dynamics fairness), while darker colors represent larger absolute values (violating dynamics fairness). Red denotes the second demographic group is advantaged, while blue denotes disadvantaged.
...and 4 more figures

Theorems & Definitions (13)

Definition 1: Well-being Gap
Lemma 1
Definition 2: Natural Direct and Indirect Effect
Theorem 1: Causal Decomposition of Well-being Gap
Lemma 2
Definition 3: Dynamics Fairness
Theorem 2: Criterion for Dynamics Fairness Violation
Theorem 3: Identification of Dynamics Fairness
proof : Proof of Lemma \ref{['lem1:GapDecomposition']}
proof : Proof of Theorem \ref{['thm1:Decomposition']}
...and 3 more

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

TL;DR

Abstract

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (13)