Table of Contents
Fetching ...

QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits

Benjamin Howson, Sarah Filippi, Ciara Pike-Burke

TL;DR

This work provides a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting, and proves that it transfers the regret guarantees of the single-agent algorithm to the multi-agent setting.

Abstract

We study the cooperative stochastic $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action. In contrast to most prior work on this problem, which focuses on extending a specific algorithm to the multi-agent setting, we provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting. Under mild assumptions on the bandit environment, we prove that our reduction transfers the regret guarantees of the single-agent algorithm to the multi-agent setting. These guarantees are tight in subgaussian environments, in that using a near minimax optimal single-player algorithm is near minimax optimal in the multi-player setting up to an additive graph-dependent quantity. Our reduction and theoretical results are also general, and apply to many different bandit settings. By plugging in appropriate single-player algorithms, we can easily develop provably efficient algorithms for many multi-player settings such as heavy-tailed bandits, duelling bandits and bandits with local differential privacy, among others. Experimentally, our approach is competitive with or outperforms specialised multi-agent algorithms.

QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits

TL;DR

This work provides a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting, and proves that it transfers the regret guarantees of the single-agent algorithm to the multi-agent setting.

Abstract

We study the cooperative stochastic -armed bandit problem, where a network of agents collaborate to find the optimal action. In contrast to most prior work on this problem, which focuses on extending a specific algorithm to the multi-agent setting, we provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting. Under mild assumptions on the bandit environment, we prove that our reduction transfers the regret guarantees of the single-agent algorithm to the multi-agent setting. These guarantees are tight in subgaussian environments, in that using a near minimax optimal single-player algorithm is near minimax optimal in the multi-player setting up to an additive graph-dependent quantity. Our reduction and theoretical results are also general, and apply to many different bandit settings. By plugging in appropriate single-player algorithms, we can easily develop provably efficient algorithms for many multi-player settings such as heavy-tailed bandits, duelling bandits and bandits with local differential privacy, among others. Experimentally, our approach is competitive with or outperforms specialised multi-agent algorithms.

Paper Structure

This paper contains 36 sections, 7 theorems, 65 equations, 7 figures, 4 algorithms.

Key Result

Lemma 1

Under Assumption assumption: bandit-environment, QuACK guarantees that, for all $n$:

Figures (7)

  • Figure 1: Grid Graph and its Shortest Path Tree.
  • Figure 2: Group Regret for a Network of $196$ Agents.
  • Figure 3: Group Regret for Cycle Graphs.
  • Figure 4: Group Regret for Grid Graphs.
  • Figure 5: Group Regret for Star Graphs.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Corollary 2
  • Corollary 3
  • ...and 1 more