Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement

Bhagyashree Puranik; Ozgur Guldogan; Upamanyu Madhow; Ramtin Pedarsani

Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement

Bhagyashree Puranik, Ozgur Guldogan, Upamanyu Madhow, Ramtin Pedarsani

TL;DR

The paper addresses how multiple decision-makers selecting from a shared applicant pool can affect long-term fairness through positive reinforcement. It introduces the Multi-agent Fair-Greedy (MFG) policy to blend score maximization with a long-term fairness target $\alpha$, proves convergence under identical score distributions, and analyzes robustness to variations in pool evolution, including role-model reinforcement. It shows that decentralized MFG achieves long-run fairness in simple settings but can trigger negative feedback under more complex dynamics, while centralized coordination via CMFG can mitigate these issues. The work provides extensive experiments on synthetic and semi-synthetic data illustrating these dynamics and highlights practical implications for designing fair longitudinal policies in multi-institution settings.

Abstract

While much of the rapidly growing literature on fair decision-making focuses on metrics for one-shot decisions, recent work has raised the intriguing possibility of designing sequential decision-making to positively impact long-term social fairness. In selection processes such as college admissions or hiring, biasing slightly towards applicants from under-represented groups is hypothesized to provide positive feedback that increases the pool of under-represented applicants in future selection rounds, thus enhancing fairness in the long term. In this paper, we examine this hypothesis and its consequences in a setting in which multiple agents are selecting from a common pool of applicants. We propose the Multi-agent Fair-Greedy policy, that balances greedy score maximization and fairness. Under this policy, we prove that the resource pool and the admissions converge to a long-term fairness target set by the agents when the score distributions across the groups in the population are identical. We provide empirical evidence of existence of equilibria under non-identical score distributions through synthetic and adapted real-world datasets. We then sound a cautionary note for more complex applicant pool evolution models, under which uncoordinated behavior by the agents can cause negative reinforcement, leading to a reduction in the fraction of under-represented applicants. Our results indicate that, while positive reinforcement is a promising mechanism for long-term fairness, policies must be designed carefully to be robust to variations in the evolution model, with a number of open issues that remain to be explored by algorithm designers, social scientists, and policymakers.

Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement

TL;DR

, proves convergence under identical score distributions, and analyzes robustness to variations in pool evolution, including role-model reinforcement. It shows that decentralized MFG achieves long-run fairness in simple settings but can trigger negative feedback under more complex dynamics, while centralized coordination via CMFG can mitigate these issues. The work provides extensive experiments on synthetic and semi-synthetic data illustrating these dynamics and highlights practical implications for designing fair longitudinal policies in multi-institution settings.

Abstract

Paper Structure (17 sections, 5 theorems, 61 equations, 9 figures)

This paper contains 17 sections, 5 theorems, 61 equations, 9 figures.

Introduction
Related Work
Modeling Multi-Agent Decision-Making
Variations on the Pool Evolution Model
Centralized Multi-agent Fair-Greedy Policy
Experimental Evaluation
Multi-agent framework evaluated on synthetic data
Multi-agent framework evaluated on semi-synthetic dataset
Conclusion
Optimizing the score-based reward under MFG policy
Technical Lemmas
Proof details for applicant pool convergence under MFG policy
Negative feedback in role model reinforcement
Additional experimental results
Identical score distributions
...and 2 more sections

Key Result

Theorem 1

Under Assumptions asm:large and asm:identical, the score-based reward function, $R_{k}(s_t, a_t^k)$, is concave and the reward-optimal action for an institution $k$ is given by and $a^{1}_{S,t} = s_t$ where $[\cdot]_{\mathcal{A}_{t}^k}$ denotes the projection onto the feasible action space $\mathcal{A}_{t}^k$.

Figures (9)

Figure 1: \ref{['fig:1a']} With pure positive reinforcement, the MFG policy reaches long-term fairness when score distributions are identical for both groups. \ref{['fig:1b']} The MFG policy attains long-term fairness more quickly with order-based positive reinforcement when $\beta=0.8$. \ref{['fig:1c']} Under weighted positive reinforcement, the MFG policy converges when institution weights are equal, for this setting
Figure 2: MFG policy under identical scores reaches long-term fairness target, independent of $\lambda$.
Figure 3: The score percentile of the admitted applicant with the least score, from each group and for each institution, under the setting where (i) $\lambda=0.75$ for all institutions (in \ref{['fig:pure_pos_percentiles']}) and (ii) $\lambda$ is in decreasing order for the three institutions (in \ref{['fig:pure_pos_percentiles_dec_lambda']}). The evolution of mean parameter and admission proportions under the case of decreasing $\lambda$ is in \ref{['fig:pure_pos_dec_lambda']}
Figure 4: \ref{['fig:mfg_role_model_identical']} MFG policy creates a negative feedback loop under the role model reinforcement. \ref{['fig:mfg_role_model_identical_role_models']} The evolution of the proportions of role models for each institution, under MFG policy. \ref{['fig:cmfg_role_model_identical']} CMFG policy could potentially alleviate negative feedback under role model reinforcement. \ref{['fig:cmfg_role_model_identical_roles']}The evolution of the proportions of role models for each institution, under CMFG policy.
Figure 5: \ref{['fig:5a']} Convergence of the mean parameter under the MFG policy with pure positive reinforcement is impacted by the fairness loss coefficient, $\lambda$, when score distributions are distinct. \ref{['fig:5b']} MFG policy reaches an equilibrium with pure positive reinforcement when score distributions are different. \ref{['fig:5c']} CMFG also reaches an equilibrium, albeit with different decisions.
...and 4 more figures

Theorems & Definitions (14)

Theorem 1
Remark
Remark
Lemma 1
Theorem 2
Remark
Proposition 1
Remark
proof
Lemma 2
...and 4 more

Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement

TL;DR

Abstract

Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (14)