Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate
Yuancheng Xu, Chenghao Deng, Yanchao Sun, Ruijie Zheng, Xiyao Wang, Jieyu Zhao, Furong Huang
TL;DR
This work introduces Equal Long-term Benefit Rate (ELBERT), a ratio-after-aggregation fairness notion for sequential decision-making modeled via a Supply-Demand MDP (SD-MDP), where a group's long-term well-being is $\frac{\eta_g^S(\pi)}{\eta_g^D(\pi)}$ and bias is $b(\pi)=\max_g\frac{\eta_g^S(\pi)}{\eta_g^D(\pi)}-\min_g\frac{\eta_g^S(\pi)}{\eta_g^D(\pi)}$. To optimize under fairness, the paper derives a fairness-aware policy gradient that reduces to standard policy gradients, enabling ELBERT-PO with PPO updates; for multi-group settings, it introduces a soft-bias surrogate $b^{\text{soft}}(\pi)$ with temperature $\beta$ and proves $b(\pi) \le b^{\text{soft}}(\pi) \le b(\pi)+\frac{2\log M}{\beta}$. The method is validated in lending, infectious disease control, and attention allocation, showing substantial bias reductions with high utility, and the work discusses extensions to demand-regularized objectives and broader implications for ethical AI in sequential tasks.
Abstract
Decisions made by machine learning models can have lasting impacts, making long-term fairness a critical consideration. It has been observed that ignoring the long-term effect and directly applying fairness criterion in static settings can actually worsen bias over time. To address biases in sequential decision-making, we introduce a long-term fairness concept named Equal Long-term Benefit Rate (ELBERT). This concept is seamlessly integrated into a Markov Decision Process (MDP) to consider the future effects of actions on long-term fairness, thus providing a unified framework for fair sequential decision-making problems. ELBERT effectively addresses the temporal discrimination issues found in previous long-term fairness notions. Additionally, we demonstrate that the policy gradient of Long-term Benefit Rate can be analytically simplified to standard policy gradients. This simplification makes conventional policy optimization methods viable for reducing bias, leading to our bias mitigation approach ELBERT-PO. Extensive experiments across various diverse sequential decision-making environments consistently reveal that ELBERT-PO significantly diminishes bias while maintaining high utility. Code is available at https://github.com/umd-huang-lab/ELBERT.
