Striking a Balance in Fairness for Dynamic Systems Through Reinforcement Learning
Yaowei Hu, Jacob Lear, Lu Zhang
TL;DR
The paper tackles fairness in dynamic, sequential decision-making by modeling the system as an MDP and distinguishing short-term fairness from long-term fairness, which can diverge. It proposes Fair PPO (F-PPO), a framework that combines a pre-processing action massaging step to enforce short-term fairness with an in-processing advantage-regularization term based on the 1-Wasserstein distance to promote long-term fairness within PPO. The key contributions include a formalization of state-based long-term fairness, a concrete algorithm integrating both fairness notions, and three simulation case studies (bank loans, attention allocation, epidemic control) demonstrating that F-PPO can balance short-term fairness, long-term fairness, and policy utility better than baselines. This work provides a practical methodology for deploying fair reinforcement learning in dynamic systems where decisions continuously shape future distributions and outcomes.
Abstract
While significant advancements have been made in the field of fair machine learning, the majority of studies focus on scenarios where the decision model operates on a static population. In this paper, we study fairness in dynamic systems where sequential decisions are made. Each decision may shift the underlying distribution of features or user behavior. We model the dynamic system through a Markov Decision Process (MDP). By acknowledging that traditional fairness notions and long-term fairness are distinct requirements that may not necessarily align with one another, we propose an algorithmic framework to integrate various fairness considerations with reinforcement learning using both pre-processing and in-processing approaches. Three case studies show that our method can strike a balance between traditional fairness notions, long-term fairness, and utility.
