The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin
Rong Jiang, Cong Ma
TL;DR
This work establishes a sharp characterization of the cost of adapting to an unknown margin in batched nonparametric contextual bandits. It introduces the regret inflation RI and proves that, for fixed M, the optimal inflation grows like T^{ψ_M^*}, where ψ_M^* is the value of a convex variational program whose minimizer also prescribes an optimal batch-and-bin design. The RoBIN algorithm achieves the optimal inflation up to polylog factors, and a phase transition shows the adaptivity barrier vanishes once M ≥ Θ(log log T). These results unify theory and algorithm design, revealing how dimension, smoothness, and batching interact to govern the price of unknown complexity. The findings have practical implications for planning batched experimentation in settings where margin uncertainty cannot be resolved online.
Abstract
We study batched nonparametric contextual bandits under a margin condition when the margin parameter $α$ is unknown. To capture the statistical cost of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing $α$. We show that the optimal regret inflation grows polynomially with the horizon $T$, with exponent given by the value of a convex optimization problem that depends on the dimension, smoothness, and number of batches $M$. Moreover, the minimizer of this optimization problem directly prescribes the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to polylogarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes once the number of batches exceeds order $\log \log T$; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.
