Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement
Bhagyashree Puranik, Ozgur Guldogan, Upamanyu Madhow, Ramtin Pedarsani
TL;DR
The paper addresses how multiple decision-makers selecting from a shared applicant pool can affect long-term fairness through positive reinforcement. It introduces the Multi-agent Fair-Greedy (MFG) policy to blend score maximization with a long-term fairness target $\alpha$, proves convergence under identical score distributions, and analyzes robustness to variations in pool evolution, including role-model reinforcement. It shows that decentralized MFG achieves long-run fairness in simple settings but can trigger negative feedback under more complex dynamics, while centralized coordination via CMFG can mitigate these issues. The work provides extensive experiments on synthetic and semi-synthetic data illustrating these dynamics and highlights practical implications for designing fair longitudinal policies in multi-institution settings.
Abstract
While much of the rapidly growing literature on fair decision-making focuses on metrics for one-shot decisions, recent work has raised the intriguing possibility of designing sequential decision-making to positively impact long-term social fairness. In selection processes such as college admissions or hiring, biasing slightly towards applicants from under-represented groups is hypothesized to provide positive feedback that increases the pool of under-represented applicants in future selection rounds, thus enhancing fairness in the long term. In this paper, we examine this hypothesis and its consequences in a setting in which multiple agents are selecting from a common pool of applicants. We propose the Multi-agent Fair-Greedy policy, that balances greedy score maximization and fairness. Under this policy, we prove that the resource pool and the admissions converge to a long-term fairness target set by the agents when the score distributions across the groups in the population are identical. We provide empirical evidence of existence of equilibria under non-identical score distributions through synthetic and adapted real-world datasets. We then sound a cautionary note for more complex applicant pool evolution models, under which uncoordinated behavior by the agents can cause negative reinforcement, leading to a reduction in the fraction of under-represented applicants. Our results indicate that, while positive reinforcement is a promising mechanism for long-term fairness, policies must be designed carefully to be robust to variations in the evolution model, with a number of open issues that remain to be explored by algorithm designers, social scientists, and policymakers.
