Table of Contents
Fetching ...

Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant

TL;DR

This work extends decentralized multi-armed bandit theory to a setting where $N$ agents learn $M$ heterogeneous bandits in a fully distributed fashion. It introduces two collaboration regimes—context unaware and partially context aware—and develops GosInE-based algorithms that propagate optimal arms through a gossip-based network. The authors derive per-agent and group regret upper bounds, prove matching lower bounds, and show that sharing best-arm information among groups reduces regret, especially when agents know $r-1$ peers learning the same bandit. The results demonstrate near-optimal performance for distributed exploration across multiple bandits and provide insights into how communication structure and local cooperation affect collective learning efficiency in decentralized systems.

Abstract

The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.

Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

TL;DR

This work extends decentralized multi-armed bandit theory to a setting where agents learn heterogeneous bandits in a fully distributed fashion. It introduces two collaboration regimes—context unaware and partially context aware—and develops GosInE-based algorithms that propagate optimal arms through a gossip-based network. The authors derive per-agent and group regret upper bounds, prove matching lower bounds, and show that sharing best-arm information among groups reduces regret, especially when agents know peers learning the same bandit. The results demonstrate near-optimal performance for distributed exploration across multiple bandits and provide insights into how communication structure and local cooperation affect collective learning efficiency in decentralized systems.

Abstract

The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of agents such that each agent is learning one of stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.
Paper Structure (28 sections, 24 theorems, 79 equations, 2 figures, 4 algorithms)

This paper contains 28 sections, 24 theorems, 79 equations, 2 figures, 4 algorithms.

Key Result

Theorem 1

Consider a system of $N \geq 2$ agents connected by a complete graph (for each $i \in [N]$, $G(i,n) = (N-1)^{-1} \forall n \neq i$) and learning one of the $M \geq 2$ bandits with $K \geq 2$ arms, satisfying Assumption assume:stickyset. Let the UCB parameter $\alpha > 10$ and the phase parameter $\b where $\tau^{*} = 2\max\{2, \max_{m \in [M]}\tau_{m}^{*}\}$, $\tau_{m}^{*} = \inf \left\{j \in \mat

Figures (2)

  • Figure 1: $(K, M, N, r)$ are $(20, 5, 25, 5)$ and $(30, 6, 36, 6)$ respectively. Arm means are in $[0, 1)$ and the UCB parameter $\alpha=15$.
  • Figure 2: $(K, M, N, r)$ are $(20, 5, 25, 5)$ and $(30, 6, 36, 6)$ respectively. Arm means are in $[2, 4)$ and the UCB parameter $\alpha=30$.

Theorems & Definitions (38)

  • Theorem 1
  • Corollary 2
  • Corollary 3
  • Theorem 4
  • Corollary 5
  • Corollary 6
  • Theorem 7
  • Theorem 8
  • Proposition 1
  • proof
  • ...and 28 more