Stochastic Bandits for Egalitarian Assignment

Eugene Lim; Vincent Y. F. Tan; Harold Soh

Stochastic Bandits for Egalitarian Assignment

Eugene Lim, Vincent Y. F. Tan, Harold Soh

TL;DR

A UCB-based policy EgalUCB is designed and analyzed and an almost-matching policy-independent impossibility result is established that establishes upper bounds on the cumulative regret in EgalMAB.

Abstract

We study EgalMAB, an egalitarian assignment problem in the context of stochastic multi-armed bandits. In EgalMAB, an agent is tasked with assigning a set of users to arms. At each time step, the agent must assign exactly one arm to each user such that no two users are assigned to the same arm. Subsequently, each user obtains a reward drawn from the unknown reward distribution associated with its assigned arm. The agent's objective is to maximize the minimum expected cumulative reward among all users over a fixed horizon. This problem has applications in areas such as fairness in job and resource allocations, among others. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret. In complement, we establish an almost-matching policy-independent impossibility result.

Stochastic Bandits for Egalitarian Assignment

TL;DR

A UCB-based policy EgalUCB is designed and analyzed and an almost-matching policy-independent impossibility result is established that establishes upper bounds on the cumulative regret in EgalMAB.

Abstract

Paper Structure (25 sections, 22 theorems, 84 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 22 theorems, 84 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Related Works
MAB with Multiple Plays.
MAB with Multiple Users.
Fairness in MAB.
EgalMAB Problem
Environment.
Agent Policy.
Egalitarian Objective.
EgalUCB Policy
Main Results
Regret Analysis
Problem-Dependent Upper Bound
Problem-Independent Upper Bound
Policy-Independent Lower Bound
...and 10 more sections

Key Result

Theorem 1

Let $(\nu,T,U)$ a $1$-subgaussian EgalMAB. After running EgalUCB for $T$ time steps, we have

Figures (7)

Figure 1: An illustration with $K=8$ arms and $U=7$ users. After time step $T$, each user $u\in[U]$ has some expected cumulative reward $\mathbb{E}[S_{u,T}]$. The agent's objective is to maximize $\min_{u\in[U]} \mathbb{E}[S_{u,T}]$, which is the minimum expected cumulative reward across all users.
Figure 2: A trace of EgalUCB with $K=5$ arms and $U=3$ users. When $b=2$, the three arms with the highest $\mathrm{UCB}_{a,b}$ values are $a\in\{2,4,5\}$, which are then assigned in a round-robin fashion across time steps $t\in\{4,5,6\}$. As $b$ increases, the estimates for $\mu_{a}$ for all arms $a$ improves. By $b=T/3$ for large $T$, the three arms with the highest $\mathrm{UCB}_{a,b}$ values are most likely $a\in\{1,2,3\}$, which are then assigned over time steps $t\in\{T-2,T-1,T\}$.
Figure 3: Expected regret incurred by EgalUCB over $T=150000.0$ time steps on simulated data with $K=10$. Each line corresponds to a different $U$. The lighter region around each line represents the range between the minimum and maximum expected regrets observed over a total of 30 independent runs.
Figure 4: Expected regret incurred by EgalUCB over $T=2^{18}$ time steps on Bernoulli bandits with $K=2^{10}$ arms.
Figure 5: Expected regret incurred by EgalUCB over $T=126000.0$ time steps on Bernoulli bandits with $K=20$ arms.
...and 2 more figures

Theorems & Definitions (35)

Theorem 1: Problem-Dependent Upper Bound
Theorem 2: Problem-Independent Upper Bound
Theorem 3: Policy-Independent Lower Bound
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
proof : Proof of Theorem \ref{['thm:upperbound-dependent']}
Lemma 6
...and 25 more

Stochastic Bandits for Egalitarian Assignment

TL;DR

Abstract

Stochastic Bandits for Egalitarian Assignment

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (35)