Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

Nived Rajaraman; Yanjun Han; Jiantao Jiao; Kannan Ramchandran

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

Nived Rajaraman, Yanjun Han, Jiantao Jiao, Kannan Ramchandran

TL;DR

The paper analyzes sequential decision making where the mean reward is a nonlinear ridge function $f(\langle \theta^*,a\rangle)$, revealing a burn-in period with a fixed cost that can dominate early sample complexity in high dimensions. It develops tight minimax lower bounds using a novel $\chi^2$-informativity framework and shows that standard exploration methods (e.g., Eluder-UCB) and regression-oracle based approaches are suboptimal for this class. A two-stage algorithm—first identifying a good initial direction, then exploiting local linearity—achieves near-optimal burn-in and, subsequently, linear-bandit-like learning with regret $O(d\sqrt{T})$. The work also provides agnostic and finite-action extensions, proving fundamental limits on nonadaptive strategies and outlining directions to close gaps between upper and lower bounds, with implications for high-dimensional adaptive experimentation and reinforcement learning.

Abstract

We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action. Compared with the linear model, two curious phenomena arise in non-linear models: first, in addition to the "learning phase" with a standard parametric rate for estimation or regret, there is an "burn-in period" with a fixed cost determined by the non-linear function; second, achieving the smallest burn-in cost requires new exploration algorithms. For a special family of non-linear functions named ridge functions in the literature, we derive upper and lower bounds on the optimal burn-in cost, and in addition, on the entire learning trajectory during the burn-in period via differential equations. In particular, a two-stage algorithm that first finds a good initial action and then treats the problem as locally linear is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on regression oracles, are provably suboptimal.

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

TL;DR

The paper analyzes sequential decision making where the mean reward is a nonlinear ridge function

, revealing a burn-in period with a fixed cost that can dominate early sample complexity in high dimensions. It develops tight minimax lower bounds using a novel

-informativity framework and shows that standard exploration methods (e.g., Eluder-UCB) and regression-oracle based approaches are suboptimal for this class. A two-stage algorithm—first identifying a good initial direction, then exploiting local linearity—achieves near-optimal burn-in and, subsequently, linear-bandit-like learning with regret

. The work also provides agnostic and finite-action extensions, proving fundamental limits on nonadaptive strategies and outlining directions to close gaps between upper and lower bounds, with implications for high-dimensional adaptive experimentation and reinforcement learning.

Abstract

Paper Structure (56 sections, 30 theorems, 170 equations, 2 figures, 5 algorithms)

This paper contains 56 sections, 30 theorems, 170 equations, 2 figures, 5 algorithms.

Introduction
Bounds on the burn-in cost
Learning trajectory during the burn-in period
Suboptimality of existing exploration algorithms
Eluder-UCB
Regression oracle based algorithms
Complexity of the learning phase
Related work
Sequential estimation, testing, and experimental design
Stochastic bandits
Complexity measures for interactive decision making
Information-theoretic view of sequential decision making
Minimax Lower Bounds
Information-theoretic insights
$\chi^2$-informativity
...and 41 more sections

Key Result

Theorem 1

In a ridge bandit problem with the link function $f$ satisfying assump:main, for any $\kappa\in (0,1/4)$, the following upper bound holds for the burn-in cost: with a hidden factor depending on $\kappa$. This upper bound is achieved by Algorithm alg:burn-in in Section subsec:upper_bound_burnin.

Figures (2)

Figure 1: When $f(x) = x^3$ is the cubic function, the minimax regret scales as $\min\{T, d^3 + d\sqrt{T}\}$ (ignoring constant and polylogarithmic factors).
Figure 2: Upper and lower bounds on the minimax learning trajectory. Here UB stands for upper bound, LB stands for lower bound, and RO stands for regression oracles.

Theorems & Definitions (44)

Definition 1: Sample Complexity for Estimation
Definition 2: Minimax Regret
Definition 3: Burn-in Cost
Theorem 1: Weaker version of \ref{['thm:ub_burnincost_formal']}
Theorem 2: Weaker version of Theorem \ref{['thm:lower_bound']}
Example 1
Definition 4: Learning trajectory
Theorem 3
Theorem 4
Theorem 5
...and 34 more

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

TL;DR

Abstract

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (44)