Table of Contents
Fetching ...

The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin

Rong Jiang, Cong Ma

TL;DR

This work establishes a sharp characterization of the cost of adapting to an unknown margin in batched nonparametric contextual bandits. It introduces the regret inflation RI and proves that, for fixed M, the optimal inflation grows like T^{ψ_M^*}, where ψ_M^* is the value of a convex variational program whose minimizer also prescribes an optimal batch-and-bin design. The RoBIN algorithm achieves the optimal inflation up to polylog factors, and a phase transition shows the adaptivity barrier vanishes once M ≥ Θ(log log T). These results unify theory and algorithm design, revealing how dimension, smoothness, and batching interact to govern the price of unknown complexity. The findings have practical implications for planning batched experimentation in settings where margin uncertainty cannot be resolved online.

Abstract

We study batched nonparametric contextual bandits under a margin condition when the margin parameter $α$ is unknown. To capture the statistical cost of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing $α$. We show that the optimal regret inflation grows polynomially with the horizon $T$, with exponent given by the value of a convex optimization problem that depends on the dimension, smoothness, and number of batches $M$. Moreover, the minimizer of this optimization problem directly prescribes the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to polylogarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes once the number of batches exceeds order $\log \log T$; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.

The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin

TL;DR

This work establishes a sharp characterization of the cost of adapting to an unknown margin in batched nonparametric contextual bandits. It introduces the regret inflation RI and proves that, for fixed M, the optimal inflation grows like T^{ψ_M^*}, where ψ_M^* is the value of a convex variational program whose minimizer also prescribes an optimal batch-and-bin design. The RoBIN algorithm achieves the optimal inflation up to polylog factors, and a phase transition shows the adaptivity barrier vanishes once M ≥ Θ(log log T). These results unify theory and algorithm design, revealing how dimension, smoothness, and batching interact to govern the price of unknown complexity. The findings have practical implications for planning batched experimentation in settings where margin uncertainty cannot be resolved online.

Abstract

We study batched nonparametric contextual bandits under a margin condition when the margin parameter is unknown. To capture the statistical cost of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing . We show that the optimal regret inflation grows polynomially with the horizon , with exponent given by the value of a convex optimization problem that depends on the dimension, smoothness, and number of batches . Moreover, the minimizer of this optimization problem directly prescribes the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to polylogarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes once the number of batches exceeds order ; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.

Paper Structure

This paper contains 102 sections, 32 theorems, 236 equations, 6 figures, 2 algorithms.

Key Result

Proposition 1

Under Assumptions ass:bdd-density-ass:margin:

Figures (6)

  • Figure 1: $\psi_{M}^\star$ vs. batch budget $M$ when $\beta=1$.
  • Figure 2: $\psi_{M}^\star$ vs. batch budget $M$ when $d=1$.
  • Figure 3: Visualization of the active cells when $d=2,M=3$. The domain is partitioned into $M=3$ vertical stripes, each subdivided into fine grids of micro-cells. Colored squares indicate active regions, with each color corresponding to a different stripe resolution parameter $z_m$.
  • Figure 4: Phase diagram illustrating the three regimes of adaptivity under the batch constraint. The $x$-axis represents the number of batches $M$, and the $y$-axis shows normalized regret. The green curve corresponds to the performance of the optimal algorithm that knows the margin parameter $\alpha$, while the red curve corresponds to the optimal adaptive algorithm without knowledge of $\alpha$. The dotted line shows the exponent for an optimal online algorithm.
  • Figure 5: Lower bound of $u_M$ vs. batch budget $M$ when $\gamma_M<0.45$.
  • ...and 1 more figures

Theorems & Definitions (46)

  • Proposition 1
  • Proposition 2
  • Definition 1
  • Theorem 1
  • Proposition 3
  • Proposition 4
  • Corollary 1
  • Theorem 2
  • Lemma 1
  • Proposition 5
  • ...and 36 more