Table of Contents
Fetching ...

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

TL;DR

This work tackles stochastic zeroth-order optimization for strongly convex, highly smooth functions by deriving the first tight minimax simple regret rate and presenting a two-stage algorithm that combines a bootstrapping phase with a Newton-like final stage. The core methodology hinges on a sharp, non-isotropic gradient-estimator analysis under Lipschitz Hessian and a robust Hessian-estimation framework, enabling a Hessian-transformation that yields near-identity curvature and an efficient Newton-type update. The main contributions include tight upper and lower bounds that scale with dimension and the Hessian Lipschitz constant, a novel ellipsoid-based gradient estimator with precise bias and variance controls, and a lower bound via KL-divergence showing minimax optimality. These results advance understanding of how higher-order smoothness affects sample complexity in bandit-like optimization and provide a principled pathway for practical zeroth-order methods in high-dimensional, stochastic settings.

Abstract

Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds. We propose an algorithm that features a combination of a bootstrapping stage and a mirror-descent stage. Our main technical innovation consists of a sharp characterization for the spherical-sampling gradient estimator under higher-order smoothness conditions, which allows the algorithm to optimally balance the bias-variance tradeoff, and a new iterative method for the bootstrapping stage, which maintains the performance for unbounded Hessian.

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

TL;DR

This work tackles stochastic zeroth-order optimization for strongly convex, highly smooth functions by deriving the first tight minimax simple regret rate and presenting a two-stage algorithm that combines a bootstrapping phase with a Newton-like final stage. The core methodology hinges on a sharp, non-isotropic gradient-estimator analysis under Lipschitz Hessian and a robust Hessian-estimation framework, enabling a Hessian-transformation that yields near-identity curvature and an efficient Newton-type update. The main contributions include tight upper and lower bounds that scale with dimension and the Hessian Lipschitz constant, a novel ellipsoid-based gradient estimator with precise bias and variance controls, and a lower bound via KL-divergence showing minimax optimality. These results advance understanding of how higher-order smoothness affects sample complexity in bandit-like optimization and provide a principled pathway for practical zeroth-order methods in high-dimensional, stochastic settings.

Abstract

Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds. We propose an algorithm that features a combination of a bootstrapping stage and a mirror-descent stage. Our main technical innovation consists of a sharp characterization for the spherical-sampling gradient estimator under higher-order smoothness conditions, which allows the algorithm to optimally balance the bias-variance tradeoff, and a new iterative method for the bootstrapping stage, which maintains the performance for unbounded Hessian.
Paper Structure (25 sections, 11 theorems, 158 equations, 1 table, 4 algorithms)

This paper contains 25 sections, 11 theorems, 158 equations, 1 table, 4 algorithms.

Key Result

Theorem 3.1

For any dimension $d$ and constants $\rho,M,R$, the minimax simple regrets are upper bounded by $\limsup_{T\rightarrow \infty}\mathfrak R(T;\rho,M,R)\cdot T^{\frac{2}{3}}\leq C \cdot \left(\frac{\rho^{\frac{2}{3}}}{M}d \right)$, where $C$ is a universal constant.

Theorems & Definitions (23)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Remark 4.2
  • Theorem 4.3
  • Proposition 4.4
  • proof : Proof of Theorem \ref{['thm1']} given inequality \ref{['eq:fststbd']}.
  • Definition B.1
  • Lemma B.2: Restatement of Proposition 7 in theotherpaper
  • Proposition C.1
  • ...and 13 more