A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise
Jingxin Zhan, Yuchen Xin, Kaicheng Jin, Zhihua Zhang
TL;DR
The paper addresses stochastic convex bandits under a linear vanishing-noise model and develops a Regularized Online Newton Method (RONM) that achieves polylogarithmic regret under a quadratic growth condition. By introducing a regularization term, RONM enforces a linear growth of the precision matrix, enabling tighter regret bounds and faster convergence to the minimizer when the loss grows quadratically, with extensions to noise-scaling and multiplicative-noise models. Theoretical guarantees show regret bounds of order $ ilde{O}(H^4 d^6 L^{10}/ ho)$ for $ ho$-QG functions and faster rates when the growth is stronger, along with near $t^{-1/2}$ convergence of rescaled iterates; the results also cover the cases $f$ with $(eta, ext{ extell})$-convexity for $1< ext{ extell}\le 2$ and the special $q=1$ regime under extra assumptions. The work advances the understanding of second-order online methods in zeroth-order bandits with vanishing noise and introduces two new bandit models, broadening applicability to settings with noise scaling by a function $oldsymbol{\sigma(x)}$ or multiplicative noise. Overall, the results offer polylogarithmic regret and accelerated convergence in a broad convex-bandit framework, with implications for high-dimensional online optimization under structured noise.
Abstract
We study a stochastic convex bandit problem where the subgaussian noise parameter is assumed to decrease linearly as the learner selects actions closer and closer to the minimizer of the convex loss function. Accordingly, we propose a Regularized Online Newton Method (RONM) for solving the problem, based on the Online Newton Method (ONM) of arXiv:2406.06506. Our RONM reaches a polylogarithmic regret in the time horizon $n$ when the loss function grows quadratically in the constraint set, which recovers the results of arXiv:2402.12042 in linear bandits. Our analyses rely on the growth rate of the precision matrix $Σ_t^{-1}$ in ONM and we find that linear growth solves the question exactly. These analyses also help us obtain better convergence rates when the loss function grows faster. We also study and analyze two new bandit models: stochastic convex bandits with noise scaled to a subgaussian parameter function and convex bandits with stochastic multiplicative noise.
