Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

Subham Pokhriyal; Shweta Jain; Ganesh Ghalme; Swapnil Dhamal; Sujit Gujar

Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

Subham Pokhriyal, Shweta Jain, Ganesh Ghalme, Swapnil Dhamal, Sujit Gujar

TL;DR

This work addresses fairness in stochastic multi-armed bandits with arms organized into groups by introducing Bi-Level Fairness, which combines Group Exposure Fairness (GEF) with Meritocratic Fairness within groups. It proposes BF-UCB, a UCB-based algorithm that guarantees anytime GEF and convergence to within-group fair policies, and provides a regret decomposition into group-exposure and within-group learning components, achieving a sublinear $O(\sqrt{T})$ regret. Theoretical guarantees are complemented by simulations showing BF-UCB yields balanced group exposure and fair within-group allocations with only modest losses in total reward compared to standard UCB. The approach enables fair resource distribution across groups while preserving learning efficiency, making it applicable to crowdsourcing, screening, and allocation tasks with grouped agents.

Abstract

Existing approaches to fairness in stochastic multi-armed bandits (MAB) primarily focus on exposure guarantee to individual arms. When arms are naturally grouped by certain attribute(s), we propose Bi-Level Fairness, which considers two levels of fairness. At the first level, Bi-Level Fairness guarantees a certain minimum exposure to each group. To address the unbalanced allocation of pulls to individual arms within a group, we consider meritocratic fairness at the second level, which ensures that each arm is pulled according to its merit within the group. Our work shows that we can adapt a UCB-based algorithm to achieve a Bi-Level Fairness by providing (i) anytime Group Exposure Fairness guarantees and (ii) ensuring individual-level Meritocratic Fairness within each group. We first show that one can decompose regret bounds into two components: (a) regret due to anytime group exposure fairness and (b) regret due to meritocratic fairness within each group. Our proposed algorithm BF-UCB balances these two regrets optimally to achieve the upper bound of $O(\sqrt{T})$ on regret; $T$ being the stopping time. With the help of simulated experiments, we further show that BF-UCB achieves sub-linear regret; provides better group and individual exposure guarantees compared to existing algorithms; and does not result in a significant drop in reward with respect to UCB algorithm, which does not impose any fairness constraint.

Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

TL;DR

regret. Theoretical guarantees are complemented by simulations showing BF-UCB yields balanced group exposure and fair within-group allocations with only modest losses in total reward compared to standard UCB. The approach enables fair resource distribution across groups while preserving learning efficiency, making it applicable to crowdsourcing, screening, and allocation tasks with grouped agents.

Abstract

on regret;

being the stopping time. With the help of simulated experiments, we further show that BF-UCB achieves sub-linear regret; provides better group and individual exposure guarantees compared to existing algorithms; and does not result in a significant drop in reward with respect to UCB algorithm, which does not impose any fairness constraint.

Paper Structure (28 sections, 17 theorems, 9 equations, 3 figures, 3 algorithms)

This paper contains 28 sections, 17 theorems, 9 equations, 3 figures, 3 algorithms.

Introduction
Related Work
Model and Preliminaries
Group Exposure Fairness
Meritocratic Fairness within Groups
Bi-Level Fairness
BF-UCB: Proposed Algorithm
Theoretical Results
Bi-Level Fairness Guarantees of BF-UCB
Regret Decomposition Theorem
Regret of BF-UCB
Experiments
Baselines
UCB
Meritocratic Fair Algorithm (MF)
...and 13 more sections

Key Result

Theorem 2

Algorithm alg:group-exposurefair satisfies anytime GEF guarantees, i.e., $\lfloor \beta_g t \rfloor \le N_{g,t}$ for all $t \ge 1$ and for all groups $g \in G$. We have $\beta_g>0$ and for any $\beta-$Bi-Level Fairness algorithm $\beta_g \in (0,\frac{1}{m}]$ for all $g\in[m]$ and $\sum_{g\in m}\bet

Figures (3)

Figure 1: For the BF-UCB algorithm: Comparison of Reward Regret over time for different values of $\beta$
Figure 2: For the BF-UCB algorithm: Comparison of Meritocratic Fairness Regret over time across the different groups
Figure 3: Comparison of BF-UCB , GEF and MF on different performance measures for the setting involving high number of arms

Theorems & Definitions (22)

Definition 1: patil2021achieving
Definition 2: $\beta-$Group Exposure Fairness
Definition 3: Meritocratic Fairness
Definition 4: $\beta-$Bi-Level Fairness
Definition 5
Theorem 2
Theorem 3
Theorem 4: Regret decomposition Theorem
Lemma 4
Lemma 4
...and 12 more

Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

TL;DR

Abstract

Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (22)