Table of Contents
Fetching ...

Robust Batched Bandits

Yunwen Guo, Yunlun Shu, Gongyi Zhuo, Tianyu Wang

TL;DR

This paper proposes robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings, and reveals a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret.

Abstract

The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret. In stark contrast, for the instance-dependent setting, the required number of batches to attain near-optimal regret remains invariant with respect to tail heaviness.

Robust Batched Bandits

TL;DR

This paper proposes robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings, and reveals a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret.

Abstract

The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret. In stark contrast, for the instance-dependent setting, the required number of batches to attain near-optimal regret remains invariant with respect to tail heaviness.

Paper Structure

This paper contains 33 sections, 18 theorems, 138 equations, 2 tables, 2 algorithms.

Key Result

Lemma 1

Consider distribution $X$ with finite mean $\mu$ and finite $1+\varepsilon$ moments $\mathbb{E}\left[ | X-\mu |^{1+\varepsilon} \right] \le v$ for some parameters $\varepsilon \in (0,1]$ and $v \in (0, \infty)$. Let $X_1,\cdots,X_n$ be i.i.d. random variables following $X$. For any $\delta \in (0,1)

Theorems & Definitions (36)

  • Lemma 1: Lemma 2 in bubeck2013bandits
  • Definition 1: Median of means
  • Definition 2
  • Remark 1
  • Definition 3
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Corollary 2
  • ...and 26 more