Table of Contents
Fetching ...

A Unified Zeroth-Order Optimization Framework via Oblivious Randomized Sketching

Haishan Ye, Xiangyu Chang, Xi Chen

TL;DR

This work develops a unified zeroth-order optimization framework using oblivious randomized sketching to unify and improve gradient estimation methods such as FD and RFD. By treating gradient estimation through a sketching matrix $S$, the authors achieve variance reduction in RFD and establish high-probability convergence with weak dimension dependence, notably a query complexity scaling with $\mathrm{tr}(A)/\mu$ rather than $d$. The framework extends to Hessian-aware settings via sketched preconditioners and to general $L$-smooth, $\mu$-strongly convex objectives with Lipschitz Hessians, enabling improved complexity when the Hessian traces are small. A practical trace-estimation scheme allows automatic step-size selection and further improves adaptivity. Comprehensive experiments on synthetic quadratic problems and real-world logistic-regression datasets validate the dimensionality-robust performance and the effectiveness of the trace-based Hessian insight for large-scale black-box optimization.

Abstract

We propose a new framework for analyzing zeroth-order optimization (ZOO) from the perspective of \emph{oblivious randomized sketching}.In this framework, commonly used gradient estimators in ZOO-such as finite difference (FD) and random finite difference (RFD)-are unified through a general sketch-based formulation. By introducing the concept of oblivious randomized sketching, we show that properly chosen sketch matrices can significantly reduce the high variance of RFD estimates and enable \emph{high-probability} convergence guarantees of ZOO, which are rarely available in existing RFD analyses. \noindent We instantiate the framework on convex quadratic objectives and derive a query complexity of $\tilde{\mathcal{O}}(\mathrm{tr}(A)/L \cdot L/μ\log\frac{1}ε)$ to achieve a $ε$-suboptimal solution, where $A$ is the Hessian, $L$ is the largest eigenvalue of $A$, and $μ$ denotes the strong convexity parameter. This complexity can be substantially smaller than the standard query complexity of ${\cO}(d\cdot L/μ\log\frac{1}ε)$ that is linearly dependent on problem dimensionality, especially when $A$ has rapidly decaying eigenvalues. These advantages naturally extend to more general settings, including strongly convex and Hessian-aware optimization. \noindent Overall, this work offers a novel sketch-based perspective on ZOO that explains why and when RFD-type methods can achieve \emph{weakly dimension-independent} convergence in general smooth problems, providing both theoretical foundations and practical implications for ZOO.

A Unified Zeroth-Order Optimization Framework via Oblivious Randomized Sketching

TL;DR

This work develops a unified zeroth-order optimization framework using oblivious randomized sketching to unify and improve gradient estimation methods such as FD and RFD. By treating gradient estimation through a sketching matrix , the authors achieve variance reduction in RFD and establish high-probability convergence with weak dimension dependence, notably a query complexity scaling with rather than . The framework extends to Hessian-aware settings via sketched preconditioners and to general -smooth, -strongly convex objectives with Lipschitz Hessians, enabling improved complexity when the Hessian traces are small. A practical trace-estimation scheme allows automatic step-size selection and further improves adaptivity. Comprehensive experiments on synthetic quadratic problems and real-world logistic-regression datasets validate the dimensionality-robust performance and the effectiveness of the trace-based Hessian insight for large-scale black-box optimization.

Abstract

We propose a new framework for analyzing zeroth-order optimization (ZOO) from the perspective of \emph{oblivious randomized sketching}.In this framework, commonly used gradient estimators in ZOO-such as finite difference (FD) and random finite difference (RFD)-are unified through a general sketch-based formulation. By introducing the concept of oblivious randomized sketching, we show that properly chosen sketch matrices can significantly reduce the high variance of RFD estimates and enable \emph{high-probability} convergence guarantees of ZOO, which are rarely available in existing RFD analyses. \noindent We instantiate the framework on convex quadratic objectives and derive a query complexity of to achieve a -suboptimal solution, where is the Hessian, is the largest eigenvalue of , and denotes the strong convexity parameter. This complexity can be substantially smaller than the standard query complexity of that is linearly dependent on problem dimensionality, especially when has rapidly decaying eigenvalues. These advantages naturally extend to more general settings, including strongly convex and Hessian-aware optimization. \noindent Overall, this work offers a novel sketch-based perspective on ZOO that explains why and when RFD-type methods can achieve \emph{weakly dimension-independent} convergence in general smooth problems, providing both theoretical foundations and practical implications for ZOO.

Paper Structure

This paper contains 31 sections, 21 theorems, 113 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

Let the objective function $\phi(\mathbf{x})$ satisfy Assumption ass:phi and Assumption ass:zeta. We assume that $S\in\mathbb{R}^{d \times \ell}$ is a pre-defined matrix. Then the approximate gradient $\mathbf{g}(\mathbf{x})$ defined in Eq. eq:g satisfies that where $\mathbf{v}$ is an $\ell$-dimension vector whose $i$-th entry $v^{(i)} = \frac{\zeta(\mathbf{x} + \alpha \mathbf{s}^{(i)}) - \zeta(\

Figures (4)

  • Figure 1: Illustration of eigenvalue decay patterns in the synthetic Hessian matrices.
  • Figure 2: Comparison of query complexity for different ZOO methods on convex quadratic functions with various Hessian eigenvalue structures.
  • Figure 3: Eigenvalue distributions of the Hessian matrices near the optimal solutions for four logistic regression datasets. All distributions exhibit fast decay patterns, indicating that the proposed zeroth-order algorithm is well-suited for such settings.
  • Figure 4: Query complexity comparison of zeroth-order algorithms on real-world logistic regression datasets.

Theorems & Definitions (27)

  • Lemma 1
  • Definition 1: Sketching in Matrix Product
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Corollary 1
  • Remark 1
  • Remark 2
  • Lemma 4
  • Theorem 2
  • ...and 17 more