Power Homotopy for Zeroth-Order Non-Convex Optimizations
Chen Xu
TL;DR
GS-PowerHP presents a zeroth-order non-convex optimization method that combines a power-transformed Gaussian-smoothed surrogate $F_{N,\sigma}(\mu)=\mathbb{E}_{x\sim\mathcal{N}(\mu,\sigma^2 I_d)}[e^{N f(x)}]$ with an incrementally decaying smoothing radius $\sigma$. The algorithm operates in a single loop, updating $\mu$ via a stochastic gradient estimate and progressively reducing $\sigma$, and it provides convergence guarantees to a neighborhood of the global maximizer $x^*$ with a complexity of $O(d^2 \varepsilon^{-2})$. Theoretical results show that, for sufficiently large power $N$, stationary points of the surrogate concentrate near $x^*$, enabling convergence of $\mu_t$ to $\mathcal{S}_{x^*,\delta}$. Empirically, GS-PowerHP outperforms competing smoothing-based and evolutionary zeroth-order methods across optimization benchmarks and achieves strong results on high-dimensional black-box image attacks on ImageNet, illustrating robustness and scalability to very large $d$.
Abstract
We introduce GS-PowerHP, a novel zeroth-order method for non-convex optimization problems of the form $\max_{x \in \mathbb{R}^d} f(x)$. Our approach leverages two key components: a power-transformed Gaussian-smoothed surrogate $F_{N,σ}(μ) = \mathbb{E}_{x\sim\mathcal{N}(μ,σ^2 I_d)}[e^{N f(x)}]$ whose stationary points cluster near the global maximizer $x^*$ of $f$ for sufficiently large $N$, and an incrementally decaying $σ$ for enhanced data efficiency. Under mild assumptions, we prove convergence in expectation to a small neighborhood of $x^*$ with the iteration complexity of $O(d^2 \varepsilon^{-2})$. Empirical results show our approach consistently ranks among the top three across a suite of competing algorithms. Its robustness is underscored by the final experiment on a substantially high-dimensional problem ($d=150,528$), where it achieved first place on least-likely targeted black-box attacks against images from ImageNet, surpassing all competing methods.
