Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

Ruizhong Qiu; Hanghang Tong

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

Ruizhong Qiu, Hanghang Tong

TL;DR

This work tackles nonconvex zeroth-order optimization in very high dimensions under approximately sparse gradients. It introduces Gradient Compressed Sensing (GraCe), a gradient estimator that achieves a double-logarithmic dependence on dimension, with queries per step scaling as $O\big(s\log\log\frac{d}{s}\big)$ and an $O\big(\frac{1}{T}\big)$ convergence rate. GraCe generalizes the Indyk–Price–Woodruff approach from linear to nonlinear measurements and implements a dependent random partition to dramatically reduce constants, improving both theory and practice. Empirically, GraCe outperforms 12 baselines on 10,000-dimensional tasks, validating its query efficiency and robustness across synthetic and real-world problems.

Abstract

We study nonconvex zeroth-order optimization (ZOO) in a high-dimensional space $\mathbb R^d$ for functions with approximately $s$-sparse gradients. To reduce the dependence on the dimensionality $d$ in the query complexity, high-dimensional ZOO methods seek to leverage gradient sparsity to design gradient estimators. The previous best method needs $O\big(s\log\frac ds\big)$ queries per step to achieve $O\big(\frac1T\big)$ rate of convergence w.r.t. the number T of steps. In this paper, we propose *Gradient Compressed Sensing* (GraCe), a query-efficient and accurate estimator for sparse gradients that uses only $O\big(s\log\log\frac ds\big)$ queries per step and still achieves $O\big(\frac1T\big)$ rate of convergence. To our best knowledge, we are the first to achieve a *double-logarithmic* dependence on $d$ in the query complexity under weaker assumptions. Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions. Furthermore, since the IPW algorithm is purely theoretical due to its impractically large constant, we improve the IPW algorithm via our *dependent random partition* technique together with our corresponding novel analysis and successfully reduce the constant by a factor of nearly 4300. Our GraCe is not only theoretically query-efficient but also achieves strong empirical performance. We benchmark our GraCe against 12 existing ZOO methods with 10000-dimensional functions and demonstrate that GraCe significantly outperforms existing methods.

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

TL;DR

and an

convergence rate. GraCe generalizes the Indyk–Price–Woodruff approach from linear to nonlinear measurements and implements a dependent random partition to dramatically reduce constants, improving both theory and practice. Empirically, GraCe outperforms 12 baselines on 10,000-dimensional tasks, validating its query efficiency and robustness across synthetic and real-world problems.

Abstract

We study nonconvex zeroth-order optimization (ZOO) in a high-dimensional space

for functions with approximately

-sparse gradients. To reduce the dependence on the dimensionality

in the query complexity, high-dimensional ZOO methods seek to leverage gradient sparsity to design gradient estimators. The previous best method needs

queries per step to achieve

rate of convergence w.r.t. the number T of steps. In this paper, we propose *Gradient Compressed Sensing* (GraCe), a query-efficient and accurate estimator for sparse gradients that uses only

queries per step and still achieves

rate of convergence. To our best knowledge, we are the first to achieve a *double-logarithmic* dependence on

in the query complexity under weaker assumptions. Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions. Furthermore, since the IPW algorithm is purely theoretical due to its impractically large constant, we improve the IPW algorithm via our *dependent random partition* technique together with our corresponding novel analysis and successfully reduce the constant by a factor of nearly 4300. Our GraCe is not only theoretically query-efficient but also achieves strong empirical performance. We benchmark our GraCe against 12 existing ZOO methods with 10000-dimensional functions and demonstrate that GraCe significantly outperforms existing methods.

Paper Structure (31 sections, 13 theorems, 128 equations, 3 figures, 6 tables, 3 algorithms)

This paper contains 31 sections, 13 theorems, 128 equations, 3 figures, 6 tables, 3 algorithms.

Introduction
Preliminaries
Notation
Assumptions
GraCe: Gradient Compressed Sensing
Base case: Approximately $1$-sparse gradient
General case: Approximately $s$-sparse gradient
Zeroth-Order Optimization with GraCe
Experiments
Benchmark Functions
Baselines & Implementation Details
Results & Discussion
Additional Experiments
Related Work
High-dimensional zeroth-order optimization
...and 16 more sections

Key Result

Lemma 3.1

There is an absolute constant $C_1>0$ such that given $\boldsymbol{x}\in\mathbb{R}^d$, $\epsilon>0$, $S\subseteq[d]$, $0<\delta_1,\delta_2<1$, and integer $2\le D\le d$, if there exists $j\in S$ with $|\nabla_j f(\boldsymbol{x})|>$ then using $O(1)$ queries, with probability $\ge1-(\delta_1+\delta_2)$, we can find a subset $S'\subseteq S$ with $j\in S'$ and Here, $\lambda_{1,n}:=L_1(d^2+d+\frac{1

Figures (3)

Figure 1: Convergence plots for Distance (mean$\,\pm\,$s.e.).
Figure 2: Convergence plots for Magnitude (mean$\,\pm\,$s.e.).
Figure 3: Convergence plots for Attack (mean$\,\pm\,$s.e.).

Theorems & Definitions (26)

Lemma 3.1
Lemma 3.2
Lemma 3.3
Theorem 3.4
Theorem 4.1
Theorem 4.2
Lemma 1.1: Lemma 1.2.3, nesterov2018lectures
proof
Lemma 1.2: Lemma 1.2.2, nesterov2018lectures
proof
...and 16 more

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

TL;DR

Abstract

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (26)