Table of Contents
Fetching ...

Stochastic Bandits for Egalitarian Assignment

Eugene Lim, Vincent Y. F. Tan, Harold Soh

TL;DR

A UCB-based policy EgalUCB is designed and analyzed and an almost-matching policy-independent impossibility result is established that establishes upper bounds on the cumulative regret in EgalMAB.

Abstract

We study EgalMAB, an egalitarian assignment problem in the context of stochastic multi-armed bandits. In EgalMAB, an agent is tasked with assigning a set of users to arms. At each time step, the agent must assign exactly one arm to each user such that no two users are assigned to the same arm. Subsequently, each user obtains a reward drawn from the unknown reward distribution associated with its assigned arm. The agent's objective is to maximize the minimum expected cumulative reward among all users over a fixed horizon. This problem has applications in areas such as fairness in job and resource allocations, among others. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret. In complement, we establish an almost-matching policy-independent impossibility result.

Stochastic Bandits for Egalitarian Assignment

TL;DR

A UCB-based policy EgalUCB is designed and analyzed and an almost-matching policy-independent impossibility result is established that establishes upper bounds on the cumulative regret in EgalMAB.

Abstract

We study EgalMAB, an egalitarian assignment problem in the context of stochastic multi-armed bandits. In EgalMAB, an agent is tasked with assigning a set of users to arms. At each time step, the agent must assign exactly one arm to each user such that no two users are assigned to the same arm. Subsequently, each user obtains a reward drawn from the unknown reward distribution associated with its assigned arm. The agent's objective is to maximize the minimum expected cumulative reward among all users over a fixed horizon. This problem has applications in areas such as fairness in job and resource allocations, among others. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret. In complement, we establish an almost-matching policy-independent impossibility result.
Paper Structure (25 sections, 22 theorems, 84 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 22 theorems, 84 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let $(\nu,T,U)$ a $1$-subgaussian EgalMAB. After running EgalUCB for $T$ time steps, we have

Figures (7)

  • Figure 1: An illustration with $K=8$ arms and $U=7$ users. After time step $T$, each user $u\in[U]$ has some expected cumulative reward $\mathbb{E}[S_{u,T}]$. The agent's objective is to maximize $\min_{u\in[U]} \mathbb{E}[S_{u,T}]$, which is the minimum expected cumulative reward across all users.
  • Figure 2: A trace of EgalUCB with $K=5$ arms and $U=3$ users. When $b=2$, the three arms with the highest $\mathrm{UCB}_{a,b}$ values are $a\in\{2,4,5\}$, which are then assigned in a round-robin fashion across time steps $t\in\{4,5,6\}$. As $b$ increases, the estimates for $\mu_{a}$ for all arms $a$ improves. By $b=T/3$ for large $T$, the three arms with the highest $\mathrm{UCB}_{a,b}$ values are most likely $a\in\{1,2,3\}$, which are then assigned over time steps $t\in\{T-2,T-1,T\}$.
  • Figure 3: Expected regret incurred by EgalUCB over $T=150000.0$ time steps on simulated data with $K=10$. Each line corresponds to a different $U$. The lighter region around each line represents the range between the minimum and maximum expected regrets observed over a total of 30 independent runs.
  • Figure 4: Expected regret incurred by EgalUCB over $T=2^{18}$ time steps on Bernoulli bandits with $K=2^{10}$ arms.
  • Figure 5: Expected regret incurred by EgalUCB over $T=126000.0$ time steps on Bernoulli bandits with $K=20$ arms.
  • ...and 2 more figures

Theorems & Definitions (35)

  • Theorem 1: Problem-Dependent Upper Bound
  • Theorem 2: Problem-Independent Upper Bound
  • Theorem 3: Policy-Independent Lower Bound
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • proof : Proof of Theorem \ref{['thm:upperbound-dependent']}
  • Lemma 6
  • ...and 25 more