Table of Contents
Fetching ...

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

Zicheng Hu, Cheng Chen

TL;DR

This work tackles stochastic multi-armed bandits under adversarial corruptions, introducing BARBAT (Bad Arms get Recourse, Best Arm gets Trust) to achieve near-optimal regret with static epoch lengths and epoch-varying failure probabilities, decoupling corruption from the time horizon. BARBAT extends to scalable settings including cooperative multi-agent bandits (MA-BARBAT), batched bandits (BB-BARBAT), strongly observable graph bandits (SOG-BARBAT), and $d$-set semi-bandits (DS-BARBAT), all with tighter regret bounds and improved parallelizability compared to Follow-the-Regularized-Leader approaches. Theoretical results show BARBAT attains regret of the form $R(T)=O\left(C + \sum_{\Delta_k>0}\frac{\log(T)\log(KT)}{\Delta_k} + \frac{K\log(1/\Delta)\log(K/\Delta)}{\Delta}\right)$, plus an $O\left(\frac{K\log^2(1/\Delta)}{\Delta}\right)$ term independent of $T$, and exhibits favorable computational efficiency. Empirically, BARBAT variants demonstrate strong robustness and efficiency across multi-agent, graph-structured, batched, and semi-bandit scenarios, highlighting its practical impact for scalable robust bandit learning.

Abstract

We investigate various stochastic bandit problems in the presence of adversarial corruptions. A seminal work for this problem is the BARBAR~\cite{gupta2019better} algorithm, which achieves both robustness and efficiency. However, it suffers from a regret of $O(KC)$, which does not match the lower bound of $Ω(C)$, where $K$ denotes the number of arms and $C$ denotes the corruption level. In this paper, we first improve the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ to achieve an optimal regret bound up to a logarithmic factor. We also extend BARBAT to various settings, including multi-agent bandits, graph bandits, combinatorial semi-bandits and batched bandits. Compared with the Follow-the-Regularized-Leader framework, our methods are more amenable to parallelization, making them suitable for multi-agent and batched bandit settings, and they incur lower computational costs, particularly in semi-bandit problems. Numerical experiments verify the efficiency of the proposed methods.

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

TL;DR

This work tackles stochastic multi-armed bandits under adversarial corruptions, introducing BARBAT (Bad Arms get Recourse, Best Arm gets Trust) to achieve near-optimal regret with static epoch lengths and epoch-varying failure probabilities, decoupling corruption from the time horizon. BARBAT extends to scalable settings including cooperative multi-agent bandits (MA-BARBAT), batched bandits (BB-BARBAT), strongly observable graph bandits (SOG-BARBAT), and -set semi-bandits (DS-BARBAT), all with tighter regret bounds and improved parallelizability compared to Follow-the-Regularized-Leader approaches. Theoretical results show BARBAT attains regret of the form , plus an term independent of , and exhibits favorable computational efficiency. Empirically, BARBAT variants demonstrate strong robustness and efficiency across multi-agent, graph-structured, batched, and semi-bandit scenarios, highlighting its practical impact for scalable robust bandit learning.

Abstract

We investigate various stochastic bandit problems in the presence of adversarial corruptions. A seminal work for this problem is the BARBAR~\cite{gupta2019better} algorithm, which achieves both robustness and efficiency. However, it suffers from a regret of , which does not match the lower bound of , where denotes the number of arms and denotes the corruption level. In this paper, we first improve the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of to achieve an optimal regret bound up to a logarithmic factor. We also extend BARBAT to various settings, including multi-agent bandits, graph bandits, combinatorial semi-bandits and batched bandits. Compared with the Follow-the-Regularized-Leader framework, our methods are more amenable to parallelization, making them suitable for multi-agent and batched bandit settings, and they incur lower computational costs, particularly in semi-bandit problems. Numerical experiments verify the efficiency of the proposed methods.

Paper Structure

This paper contains 52 sections, 31 theorems, 188 equations, 4 figures, 3 tables, 6 algorithms.

Key Result

Theorem 1

The expected regret of BARBAT satisfies

Figures (4)

  • Figure 1: Comparison between MA-BARBAT, DRAA, IND-BARBAR and IND-FTRL in cooperative multi-agent multi-armed bandits.
  • Figure 2: Comparison between SOG-BARBAT, Shannon-FTRL, and Tsallis-FTRL in strongly observable graph bandits.
  • Figure 3: Comparison between DS-BARBAT, HYBRID, LBINF, LBINF_LS and LBINF_GD in $d$-set semi-bandits.
  • Figure 4: The feedback structure for the strongly observable graph bandits.

Theorems & Definitions (65)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Theorem 2
  • Remark 3
  • Remark 4
  • Theorem 3
  • Theorem 4
  • Remark 5
  • Theorem 5
  • ...and 55 more