What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning
Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang
TL;DR
The paper tackles long-term fairness in reinforcement learning with sensitive attributes by introducing a causal framework that decomposes inequality into environment-driven dynamics and decision-driven or history-driven components. It defines dynamics fairness as the absence of direct causal effects from the sensitive attribute to environmental next-state and reward via an augmented dynamics model, and provides identification formulas to estimate these counterfactual effects from data. A model-based RL algorithm, InsightFair, leverages ensemble dynamics and a fairness-aware planning objective to detect and compensate for environment-induced disparities, balancing fairness with task performance. Through theoretical decomposition results and experiments on Allocation-v0 and Lending-v0, the work demonstrates how to explain, detect, and reduce long-term inequality in RL, offering a principled approach to attributing fairness failures to environmental mechanisms versus decisions. The proposed framework and methods have practical implications for deploying RL in sensitive, dynamic domains where long-run equity is critical.
Abstract
In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair.
