Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

Hao Qiu; Mengxiao Zhang; Nicolò Cesa-Bianchi

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

Hao Qiu, Mengxiao Zhang, Nicolò Cesa-Bianchi

TL;DR

This work addresses distributed adversarial bandits with gossip communication, establishing near-optimal minimax regret bounds that separately capture communication and bandit information costs. The authors introduce a block-based learning scheme that decouples learning from communication via a black-box reduction to delayed feedback, enabling the transfer of delayed-feedback guarantees to the distributed setting. They prove a minimax regret of tildeΘ(sqrt((ρ^{-1/2}+K/N)T)) for K-armed bandits and extend the approach to distributed linear bandits with regret tildeΘ(sqrt((ρ^{-1/2}+1/N)dT)), all achievable with gossip-only communication. The framework is complemented by adaptive bounds (small-loss and best-of-both-worlds) and a matching lower bound decomposing the difficulty into a communication term ρ^{-1/4} sqrt(T) and a bandit term sqrt(KT/N). A further extension to high-dimensional linear bandits uses a volumetric spanner to maintain O(d) communication per round, with a corresponding lower bound, highlighting the near-optimal trade-offs between network topology, exploration, and information exchange in distributed adversarial learning.

Abstract

We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$, where $T$ is the horizon, $K$ is the number of actions, and $ρ$ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$ of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost $ρ^{-1/4}\sqrt{T}$ and a bandit cost $\sqrt{KT/N}$. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in $R^d$, obtaining a regret bound of $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$, achieved with only $O(d)$ communication cost per agent and per round via a volumetric spanner.

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

TL;DR

Abstract

We study distributed adversarial bandits, where

agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is

, where

is the horizon,

is the number of actions, and

is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound

of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost

and a bandit cost

. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in

, obtaining a regret bound of

, achieved with only

communication cost per agent and per round via a volumetric spanner.

Paper Structure (53 sections, 25 theorems, 197 equations, 5 algorithms)

This paper contains 53 sections, 25 theorems, 197 equations, 5 algorithms.

Introduction
Contributions and technical challenges.
Related works
Distributed online convex optimization.
Distributed $K$-armed bandits.
Preliminaries
Distributed $K$-armed bandits.
Distributed linear bandits.
Gossip protocol.
Distributed K-armed Bandits: A Black-Box Reduction to Delayed Feedback
Proof Sketch of thm:mainmab.
Adaptive Bounds for Distributed K-armed Bandits
Small-Loss Bound
Best of Both Worlds
Distributed Adversarial Linear Bandits
...and 38 more sections

Key Result

Lemma 0

If all agents $i \in V$ run alg: black-box with gossip matrix $W$ and parameters $\kappa,B$ defined in eqn: block, then where $\overline{\boldsymbol{z}}_{\tau} \triangleq\frac{1}{N} \sum_{i=1}^N \sum_{t \in \mathcal{T}_{\tau-1}}\widehat{\boldsymbol{\ell}}_t(i)$ is defined in eqn:barz for all $\tau\in [T/B]$.

Theorems & Definitions (40)

Lemma 0
Theorem 1
Lemma 1
Lemma 1
Theorem 2
Theorem 3
Definition 5.1: Volumetric Spanner hazan2016volumetric
Proposition 5.2: bhaskara2023tight
Theorem 4
Theorem 5
...and 30 more

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

TL;DR

Abstract

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (40)