Table of Contents
Fetching ...

Power Homotopy for Zeroth-Order Non-Convex Optimizations

Chen Xu

TL;DR

GS-PowerHP presents a zeroth-order non-convex optimization method that combines a power-transformed Gaussian-smoothed surrogate $F_{N,\sigma}(\mu)=\mathbb{E}_{x\sim\mathcal{N}(\mu,\sigma^2 I_d)}[e^{N f(x)}]$ with an incrementally decaying smoothing radius $\sigma$. The algorithm operates in a single loop, updating $\mu$ via a stochastic gradient estimate and progressively reducing $\sigma$, and it provides convergence guarantees to a neighborhood of the global maximizer $x^*$ with a complexity of $O(d^2 \varepsilon^{-2})$. Theoretical results show that, for sufficiently large power $N$, stationary points of the surrogate concentrate near $x^*$, enabling convergence of $\mu_t$ to $\mathcal{S}_{x^*,\delta}$. Empirically, GS-PowerHP outperforms competing smoothing-based and evolutionary zeroth-order methods across optimization benchmarks and achieves strong results on high-dimensional black-box image attacks on ImageNet, illustrating robustness and scalability to very large $d$.

Abstract

We introduce GS-PowerHP, a novel zeroth-order method for non-convex optimization problems of the form $\max_{x \in \mathbb{R}^d} f(x)$. Our approach leverages two key components: a power-transformed Gaussian-smoothed surrogate $F_{N,σ}(μ) = \mathbb{E}_{x\sim\mathcal{N}(μ,σ^2 I_d)}[e^{N f(x)}]$ whose stationary points cluster near the global maximizer $x^*$ of $f$ for sufficiently large $N$, and an incrementally decaying $σ$ for enhanced data efficiency. Under mild assumptions, we prove convergence in expectation to a small neighborhood of $x^*$ with the iteration complexity of $O(d^2 \varepsilon^{-2})$. Empirical results show our approach consistently ranks among the top three across a suite of competing algorithms. Its robustness is underscored by the final experiment on a substantially high-dimensional problem ($d=150,528$), where it achieved first place on least-likely targeted black-box attacks against images from ImageNet, surpassing all competing methods.

Power Homotopy for Zeroth-Order Non-Convex Optimizations

TL;DR

GS-PowerHP presents a zeroth-order non-convex optimization method that combines a power-transformed Gaussian-smoothed surrogate with an incrementally decaying smoothing radius . The algorithm operates in a single loop, updating via a stochastic gradient estimate and progressively reducing , and it provides convergence guarantees to a neighborhood of the global maximizer with a complexity of . Theoretical results show that, for sufficiently large power , stationary points of the surrogate concentrate near , enabling convergence of to . Empirically, GS-PowerHP outperforms competing smoothing-based and evolutionary zeroth-order methods across optimization benchmarks and achieves strong results on high-dimensional black-box image attacks on ImageNet, illustrating robustness and scalability to very large .

Abstract

We introduce GS-PowerHP, a novel zeroth-order method for non-convex optimization problems of the form . Our approach leverages two key components: a power-transformed Gaussian-smoothed surrogate whose stationary points cluster near the global maximizer of for sufficiently large , and an incrementally decaying for enhanced data efficiency. Under mild assumptions, we prove convergence in expectation to a small neighborhood of with the iteration complexity of . Empirical results show our approach consistently ranks among the top three across a suite of competing algorithms. Its robustness is underscored by the final experiment on a substantially high-dimensional problem (), where it achieved first place on least-likely targeted black-box attacks against images from ImageNet, surpassing all competing methods.

Paper Structure

This paper contains 16 sections, 5 theorems, 31 equations, 2 figures, 9 tables, 1 algorithm.

Key Result

Lemma 1

Under Assumption coercivity, given any $N>0$ and $\sigma>0$, (1) both $F_{N,\sigma}(\bm{\mu})$ and $\nabla F_{N,\sigma}(\bm{\mu})$ are well-defined and Lipschitz in $\mathbb{R}^d$; (2) The Lipschitz constant for $\nabla F_{N,\sigma}$ is $L=2d\sigma^{-2}e^{Nf(\bm{x}^*)}$, (3) $F_{N,\sigma}$ has at le

Figures (2)

  • Figure 1: The graphs of an example $f$ and its Gaussian smoothed function $F_{N,\sigma}$, with $N=1$ and $\sigma$ assuming different values. In this example, $x^*=\arg\max_{x\in\mathbb{R}}f(x)=-0.5$.
  • Figure 2: Attacks by GS-PowerHP on two randomly selected images from ImageNet.

Theorems & Definitions (15)

  • Remark 1
  • Remark 2
  • Lemma 1
  • Theorem 1
  • proof
  • Corollary 1
  • Remark 3
  • Theorem 2
  • proof
  • Remark 4
  • ...and 5 more