Table of Contents
Fetching ...

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase, Shinji Ito, Idan Mehalel

TL;DR

This work resolves the minimax regret for the non-stochastic multi-armed bandit with expert advice (BwE) by proving a tight $\Omega\left(\sqrt{T K \log(N/K)}\right)$ lower bound that matches Kale's earlier upper bound, establishing minimax optimality for unrestricted learners. The authors introduce a SBI-based reduction framework, partitioning experts into batches and constructing an adaptive adversary to create a hidden special batch whose identification is necessary for low regret. The proof proceeds through four steps, including reductions to SBI, analysis of a one-batch game via KL-divergence, and synthesis into a general BwE lower bound, ultimately showing that any algorithm must incur at least the stated regret rate. This result closes a longstanding gap between upper and lower bounds in BwE and extends the tight bound to improper online learning, with potential implications for theory and applications involving expert advice in adversarial settings.

Abstract

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $Θ\left( \sqrt{T K \log (N/K) } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

TL;DR

This work resolves the minimax regret for the non-stochastic multi-armed bandit with expert advice (BwE) by proving a tight lower bound that matches Kale's earlier upper bound, establishing minimax optimality for unrestricted learners. The authors introduce a SBI-based reduction framework, partitioning experts into batches and constructing an adaptive adversary to create a hidden special batch whose identification is necessary for low regret. The proof proceeds through four steps, including reductions to SBI, analysis of a one-batch game via KL-divergence, and synthesis into a general BwE lower bound, ultimately showing that any algorithm must incur at least the stated regret rate. This result closes a longstanding gap between upper and lower bounds in BwE and extends the tight bound to improper online learning, with potential implications for theory and applications involving expert advice in adversarial settings.

Abstract

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be , where is the number of arms, is the number of experts, and is the time horizon.

Paper Structure

This paper contains 18 sections, 5 theorems, 15 equations, 1 table.

Key Result

Lemma 3.1

Suppose that $A$ is an algorithm for BwE, such that for any $S$ from our pool of strategies, the pseudo-regret of $A$ is bounded by $R_T \leq r(T)$ for all $T$. Let $T^\star \geq 1000r(T^\star)/\epsilon$. Then, there exists a good algorithm $A'$ for SBI such that for any strategy $S$ from our pool,

Theorems & Definitions (10)

  • Lemma 3.1
  • proof
  • Lemma 4.1: ito2024minimax
  • proof
  • Lemma 4.2
  • proof
  • Lemma 5.1
  • proof
  • Theorem 6.1
  • proof