Table of Contents
Fetching ...

A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise

Jingxin Zhan, Yuchen Xin, Kaicheng Jin, Zhihua Zhang

TL;DR

The paper addresses stochastic convex bandits under a linear vanishing-noise model and develops a Regularized Online Newton Method (RONM) that achieves polylogarithmic regret under a quadratic growth condition. By introducing a regularization term, RONM enforces a linear growth of the precision matrix, enabling tighter regret bounds and faster convergence to the minimizer when the loss grows quadratically, with extensions to noise-scaling and multiplicative-noise models. Theoretical guarantees show regret bounds of order $ ilde{O}(H^4 d^6 L^{10}/ ho)$ for $ ho$-QG functions and faster rates when the growth is stronger, along with near $t^{-1/2}$ convergence of rescaled iterates; the results also cover the cases $f$ with $(eta, ext{ extell})$-convexity for $1< ext{ extell}\le 2$ and the special $q=1$ regime under extra assumptions. The work advances the understanding of second-order online methods in zeroth-order bandits with vanishing noise and introduces two new bandit models, broadening applicability to settings with noise scaling by a function $oldsymbol{\sigma(x)}$ or multiplicative noise. Overall, the results offer polylogarithmic regret and accelerated convergence in a broad convex-bandit framework, with implications for high-dimensional online optimization under structured noise.

Abstract

We study a stochastic convex bandit problem where the subgaussian noise parameter is assumed to decrease linearly as the learner selects actions closer and closer to the minimizer of the convex loss function. Accordingly, we propose a Regularized Online Newton Method (RONM) for solving the problem, based on the Online Newton Method (ONM) of arXiv:2406.06506. Our RONM reaches a polylogarithmic regret in the time horizon $n$ when the loss function grows quadratically in the constraint set, which recovers the results of arXiv:2402.12042 in linear bandits. Our analyses rely on the growth rate of the precision matrix $Σ_t^{-1}$ in ONM and we find that linear growth solves the question exactly. These analyses also help us obtain better convergence rates when the loss function grows faster. We also study and analyze two new bandit models: stochastic convex bandits with noise scaled to a subgaussian parameter function and convex bandits with stochastic multiplicative noise.

A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise

TL;DR

The paper addresses stochastic convex bandits under a linear vanishing-noise model and develops a Regularized Online Newton Method (RONM) that achieves polylogarithmic regret under a quadratic growth condition. By introducing a regularization term, RONM enforces a linear growth of the precision matrix, enabling tighter regret bounds and faster convergence to the minimizer when the loss grows quadratically, with extensions to noise-scaling and multiplicative-noise models. Theoretical guarantees show regret bounds of order for -QG functions and faster rates when the growth is stronger, along with near convergence of rescaled iterates; the results also cover the cases with -convexity for and the special regime under extra assumptions. The work advances the understanding of second-order online methods in zeroth-order bandits with vanishing noise and introduces two new bandit models, broadening applicability to settings with noise scaling by a function or multiplicative noise. Overall, the results offer polylogarithmic regret and accelerated convergence in a broad convex-bandit framework, with implications for high-dimensional online optimization under structured noise.

Abstract

We study a stochastic convex bandit problem where the subgaussian noise parameter is assumed to decrease linearly as the learner selects actions closer and closer to the minimizer of the convex loss function. Accordingly, we propose a Regularized Online Newton Method (RONM) for solving the problem, based on the Online Newton Method (ONM) of arXiv:2406.06506. Our RONM reaches a polylogarithmic regret in the time horizon when the loss function grows quadratically in the constraint set, which recovers the results of arXiv:2402.12042 in linear bandits. Our analyses rely on the growth rate of the precision matrix in ONM and we find that linear growth solves the question exactly. These analyses also help us obtain better convergence rates when the loss function grows faster. We also study and analyze two new bandit models: stochastic convex bandits with noise scaled to a subgaussian parameter function and convex bandits with stochastic multiplicative noise.
Paper Structure (63 sections, 63 theorems, 238 equations, 1 figure, 2 algorithms)

This paper contains 63 sections, 63 theorems, 238 equations, 1 figure, 2 algorithms.

Key Result

Theorem 3.1

If $f(x)$ has the $\rho$-QG property on $\mathcal{K}$, then with probability at least $1-\delta$, the regret of Algorithm algorithm 1 is bounded by where $L=C[1+\log \max (n, d, H, 1/\rho, 1/\delta)], \,\delta=\operatorname{Poly}(1/n,1/d,1/H)\in(0,1)$ and $H=C'\max(G/r,1/r)$. Moreover, we have that for all $t\leq n$, $\|\frac{X_t}{ \pi^+(X_t)}-x_\star\|_2=\widetilde{\mathcal{O}}\left(t^{-\frac{1}

Figures (1)

  • Figure 1: This picture shows the case when $d=2$ in Lemma \ref{['proportion']}. The yellow segment is $\Pi$ and the blue sector is the spherical cone we are concerned with. The orange polygon is the combination of the two non-intersecting cones with the same bases $\Pi$.

Theorems & Definitions (103)

  • Definition 2.1: $\rho$-Quadratic Growth (QG)
  • Definition 2.2: $(\beta,\ell)$-Convexity
  • Remark 2.1
  • Theorem 3.1
  • Theorem 3.2
  • Corollary 3.3
  • Remark 3.1
  • Corollary 3.4
  • Theorem 3.5
  • Remark 3.2
  • ...and 93 more