A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase; Shinji Ito; Idan Mehalel

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase, Shinji Ito, Idan Mehalel

TL;DR

This work resolves the minimax regret for the non-stochastic multi-armed bandit with expert advice (BwE) by proving a tight $\Omega\left(\sqrt{T K \log(N/K)}\right)$ lower bound that matches Kale's earlier upper bound, establishing minimax optimality for unrestricted learners. The authors introduce a SBI-based reduction framework, partitioning experts into batches and constructing an adaptive adversary to create a hidden special batch whose identification is necessary for low regret. The proof proceeds through four steps, including reductions to SBI, analysis of a one-batch game via KL-divergence, and synthesis into a general BwE lower bound, ultimately showing that any algorithm must incur at least the stated regret rate. This result closes a longstanding gap between upper and lower bounds in BwE and extends the tight bound to improper online learning, with potential implications for theory and applications involving expert advice in adversarial settings.

Abstract

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $Θ\left( \sqrt{T K \log (N/K) } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

TL;DR

This work resolves the minimax regret for the non-stochastic multi-armed bandit with expert advice (BwE) by proving a tight

lower bound that matches Kale's earlier upper bound, establishing minimax optimality for unrestricted learners. The authors introduce a SBI-based reduction framework, partitioning experts into batches and constructing an adaptive adversary to create a hidden special batch whose identification is necessary for low regret. The proof proceeds through four steps, including reductions to SBI, analysis of a one-batch game via KL-divergence, and synthesis into a general BwE lower bound, ultimately showing that any algorithm must incur at least the stated regret rate. This result closes a longstanding gap between upper and lower bounds in BwE and extends the tight bound to improper online learning, with potential implications for theory and applications involving expert advice in adversarial settings.

Abstract

, where

is the number of arms,

is the number of experts, and

is the time horizon.

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

TL;DR

Abstract

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (10)