A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
Zachary Chase, Shinji Ito, Idan Mehalel
TL;DR
This work resolves the minimax regret for the non-stochastic multi-armed bandit with expert advice (BwE) by proving a tight $\Omega\left(\sqrt{T K \log(N/K)}\right)$ lower bound that matches Kale's earlier upper bound, establishing minimax optimality for unrestricted learners. The authors introduce a SBI-based reduction framework, partitioning experts into batches and constructing an adaptive adversary to create a hidden special batch whose identification is necessary for low regret. The proof proceeds through four steps, including reductions to SBI, analysis of a one-batch game via KL-divergence, and synthesis into a general BwE lower bound, ultimately showing that any algorithm must incur at least the stated regret rate. This result closes a longstanding gap between upper and lower bounds in BwE and extends the tight bound to improper online learning, with potential implications for theory and applications involving expert advice in adversarial settings.
Abstract
We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $Θ\left( \sqrt{T K \log (N/K) } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.
