Table of Contents
Fetching ...

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

Hao Qiu, Mengxiao Zhang, Nicolò Cesa-Bianchi

TL;DR

This work addresses distributed adversarial bandits with gossip communication, establishing near-optimal minimax regret bounds that separately capture communication and bandit information costs. The authors introduce a block-based learning scheme that decouples learning from communication via a black-box reduction to delayed feedback, enabling the transfer of delayed-feedback guarantees to the distributed setting. They prove a minimax regret of tildeΘ(sqrt((ρ^{-1/2}+K/N)T)) for K-armed bandits and extend the approach to distributed linear bandits with regret tildeΘ(sqrt((ρ^{-1/2}+1/N)dT)), all achievable with gossip-only communication. The framework is complemented by adaptive bounds (small-loss and best-of-both-worlds) and a matching lower bound decomposing the difficulty into a communication term ρ^{-1/4} sqrt(T) and a bandit term sqrt(KT/N). A further extension to high-dimensional linear bandits uses a volumetric spanner to maintain O(d) communication per round, with a corresponding lower bound, highlighting the near-optimal trade-offs between network topology, exploration, and information exchange in distributed adversarial learning.

Abstract

We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$, where $T$ is the horizon, $K$ is the number of actions, and $ρ$ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$ of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost $ρ^{-1/4}\sqrt{T}$ and a bandit cost $\sqrt{KT/N}$. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in $R^d$, obtaining a regret bound of $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$, achieved with only $O(d)$ communication cost per agent and per round via a volumetric spanner.

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

TL;DR

This work addresses distributed adversarial bandits with gossip communication, establishing near-optimal minimax regret bounds that separately capture communication and bandit information costs. The authors introduce a block-based learning scheme that decouples learning from communication via a black-box reduction to delayed feedback, enabling the transfer of delayed-feedback guarantees to the distributed setting. They prove a minimax regret of tildeΘ(sqrt((ρ^{-1/2}+K/N)T)) for K-armed bandits and extend the approach to distributed linear bandits with regret tildeΘ(sqrt((ρ^{-1/2}+1/N)dT)), all achievable with gossip-only communication. The framework is complemented by adaptive bounds (small-loss and best-of-both-worlds) and a matching lower bound decomposing the difficulty into a communication term ρ^{-1/4} sqrt(T) and a bandit term sqrt(KT/N). A further extension to high-dimensional linear bandits uses a volumetric spanner to maintain O(d) communication per round, with a corresponding lower bound, highlighting the near-optimal trade-offs between network topology, exploration, and information exchange in distributed adversarial learning.

Abstract

We study distributed adversarial bandits, where agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is , where is the horizon, is the number of actions, and is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost and a bandit cost . We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in , obtaining a regret bound of , achieved with only communication cost per agent and per round via a volumetric spanner.
Paper Structure (53 sections, 25 theorems, 197 equations, 5 algorithms)

This paper contains 53 sections, 25 theorems, 197 equations, 5 algorithms.

Key Result

Lemma 0

If all agents $i \in V$ run alg: black-box with gossip matrix $W$ and parameters $\kappa,B$ defined in eqn: block, then where $\overline{\boldsymbol{z}}_{\tau} \triangleq\frac{1}{N} \sum_{i=1}^N \sum_{t \in \mathcal{T}_{\tau-1}}\widehat{\boldsymbol{\ell}}_t(i)$ is defined in eqn:barz for all $\tau\in [T/B]$.

Theorems & Definitions (40)

  • Lemma 0
  • Theorem 1
  • Lemma 1
  • Lemma 1
  • Theorem 2
  • Theorem 3
  • Definition 5.1: Volumetric Spanner hazan2016volumetric
  • Proposition 5.2: bhaskara2023tight
  • Theorem 4
  • Theorem 5
  • ...and 30 more