A Unified Zeroth-Order Optimization Framework via Oblivious Randomized Sketching
Haishan Ye, Xiangyu Chang, Xi Chen
TL;DR
This work develops a unified zeroth-order optimization framework using oblivious randomized sketching to unify and improve gradient estimation methods such as FD and RFD. By treating gradient estimation through a sketching matrix $S$, the authors achieve variance reduction in RFD and establish high-probability convergence with weak dimension dependence, notably a query complexity scaling with $\mathrm{tr}(A)/\mu$ rather than $d$. The framework extends to Hessian-aware settings via sketched preconditioners and to general $L$-smooth, $\mu$-strongly convex objectives with Lipschitz Hessians, enabling improved complexity when the Hessian traces are small. A practical trace-estimation scheme allows automatic step-size selection and further improves adaptivity. Comprehensive experiments on synthetic quadratic problems and real-world logistic-regression datasets validate the dimensionality-robust performance and the effectiveness of the trace-based Hessian insight for large-scale black-box optimization.
Abstract
We propose a new framework for analyzing zeroth-order optimization (ZOO) from the perspective of \emph{oblivious randomized sketching}.In this framework, commonly used gradient estimators in ZOO-such as finite difference (FD) and random finite difference (RFD)-are unified through a general sketch-based formulation. By introducing the concept of oblivious randomized sketching, we show that properly chosen sketch matrices can significantly reduce the high variance of RFD estimates and enable \emph{high-probability} convergence guarantees of ZOO, which are rarely available in existing RFD analyses. \noindent We instantiate the framework on convex quadratic objectives and derive a query complexity of $\tilde{\mathcal{O}}(\mathrm{tr}(A)/L \cdot L/μ\log\frac{1}ε)$ to achieve a $ε$-suboptimal solution, where $A$ is the Hessian, $L$ is the largest eigenvalue of $A$, and $μ$ denotes the strong convexity parameter. This complexity can be substantially smaller than the standard query complexity of ${\cO}(d\cdot L/μ\log\frac{1}ε)$ that is linearly dependent on problem dimensionality, especially when $A$ has rapidly decaying eigenvalues. These advantages naturally extend to more general settings, including strongly convex and Hessian-aware optimization. \noindent Overall, this work offers a novel sketch-based perspective on ZOO that explains why and when RFD-type methods can achieve \emph{weakly dimension-independent} convergence in general smooth problems, providing both theoretical foundations and practical implications for ZOO.
