Table of Contents
Fetching ...

Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms

Xinyi Hu, Aldo Pacchiano

Abstract

We study the decentralized multi-player multi-armed bandits (MMAB) problem under a no-sensing setting, where each player receives only their own reward and obtains no information about collisions. Each arm has an unknown capacity, and if the number of players pulling an arm exceeds its capacity, all players involved receive zero reward. This setting generalizes the classical unit-capacity model and introduces new challenges in coordination and capacity discovery under severe feedback limitations. We propose A-CAPELLA (Algorithm for Capacity-Aware Parallel Elimination for Learning and Allocation), a decentralized learning algorithm that achieves logarithmic regret in this generalized regime via protocol-driven coordination.

Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms

Abstract

We study the decentralized multi-player multi-armed bandits (MMAB) problem under a no-sensing setting, where each player receives only their own reward and obtains no information about collisions. Each arm has an unknown capacity, and if the number of players pulling an arm exceeds its capacity, all players involved receive zero reward. This setting generalizes the classical unit-capacity model and introduces new challenges in coordination and capacity discovery under severe feedback limitations. We propose A-CAPELLA (Algorithm for Capacity-Aware Parallel Elimination for Learning and Allocation), a decentralized learning algorithm that achieves logarithmic regret in this generalized regime via protocol-driven coordination.

Paper Structure

This paper contains 44 sections, 11 theorems, 102 equations, 6 figures, 2 tables, 6 algorithms.

Key Result

Proposition 3.4

The good event $\mathcal{E}_{\text{good}}$ occurs when all confidence intervals defined in Equation eq:confidence_bound hold simultaneously for every player $p \in [M]$, every arm $\nu \in [K]$, and every time step $t \in \mathbb{N}$.

Figures (6)

  • Figure 1: Illustration of a 2-Grouped Round Robin Scheduling Strategy For 3 Players and 5 Arms.
  • Figure 2: Illustration of the Coordination Protocol in Phases 1 and 2.
  • Figure 3: Comparison of A-CAPELLA, Selfish UCB family, and EXP3. Results are averaged over 100 independent runs; error bars indicate one standard deviation across seeds.
  • Figure 4: Illustration of a simple Round Robin scheduling strategy with 3 players and 5 arms. The x-axis denotes time steps and the y-axis indicates arm indices. Each player cycles through the arms in order: if a player pulls arm $i$ at time $t$, they pull arm $i+1$ in the next round. A round completes when every player has pulled all arms once.
  • Figure 5: Regret comparison across algorithms. Error bars represent the standard deviation over 100 random seeds.
  • ...and 1 more figures

Theorems & Definitions (23)

  • Definition 3.2: Simple Round Robin Scheduling
  • Definition 3.3: $\psi$-Grouped Round Robin Scheduling
  • Proposition 3.4: The Good Event $\mathcal{E}_{\text{good}}$
  • Theorem 4.1: Simplified
  • Definition 4.1: Duration Function $\omega(a, t)$
  • Lemma 4.1
  • Theorem 5.1: Regret Bound for A-CAPELLA
  • Lemma A.1
  • proof
  • Lemma A.2
  • ...and 13 more