Table of Contents
Fetching ...

Global Optimization with A Power-Transformed Objective and Gaussian Smoothing

Chen Xu

TL;DR

This work addresses global maximization of a continuous, potentially non-concave function $f$ by pairing Gaussian smoothing with a power-transformed objective (GSPTO). The authors formulate two instances, PGS and EPGS, that replace $f$ with $f^N$ or $e^{Nf}$ before smoothing, enabling a single-loop stochastic gradient ascent to approximate a near-global optimum without requiring differentiability of $f$. They prove that for any $δ>0$ there exists a threshold $N_{δ}$ so the smoothed objective’s maximizer lies within a $δ$-neighborhood of the true global maximizer $x^*$, with a convergence rate of $O(d^2 σ^4 ε^{-2})$, and they demonstrate favorable empirical performance on benchmark functions and black-box adversarial attacks. The results indicate that emphasizing the global maximum before smoothing yields faster convergence and robust practical performance, offering a versatile framework for global optimization in machine learning and related applications.

Abstract

We propose a novel method that solves global optimization problems in two steps: (1) perform a (exponential) power-$N$ transformation to the not-necessarily differentiable objective function $f$ and get $f_N$, and (2) optimize the Gaussian-smoothed $f_N$ with stochastic approximations. Under mild conditions on $f$, for any $δ>0$, we prove that with a sufficiently large power $N_δ$, this method converges to a solution in the $δ$-neighborhood of $f$'s global optimum point. The convergence rate is $O(d^2σ^4\varepsilon^{-2})$, which is faster than both the standard and single-loop homotopy methods if $σ$ is pre-selected to be in $(0,1)$. In most of the experiments performed, our method produces better solutions than other algorithms that also apply smoothing techniques.

Global Optimization with A Power-Transformed Objective and Gaussian Smoothing

TL;DR

This work addresses global maximization of a continuous, potentially non-concave function by pairing Gaussian smoothing with a power-transformed objective (GSPTO). The authors formulate two instances, PGS and EPGS, that replace with or before smoothing, enabling a single-loop stochastic gradient ascent to approximate a near-global optimum without requiring differentiability of . They prove that for any there exists a threshold so the smoothed objective’s maximizer lies within a -neighborhood of the true global maximizer , with a convergence rate of , and they demonstrate favorable empirical performance on benchmark functions and black-box adversarial attacks. The results indicate that emphasizing the global maximum before smoothing yields faster convergence and robust practical performance, offering a versatile framework for global optimization in machine learning and related applications.

Abstract

We propose a novel method that solves global optimization problems in two steps: (1) perform a (exponential) power- transformation to the not-necessarily differentiable objective function and get , and (2) optimize the Gaussian-smoothed with stochastic approximations. Under mild conditions on , for any , we prove that with a sufficiently large power , this method converges to a solution in the -neighborhood of 's global optimum point. The convergence rate is , which is faster than both the standard and single-loop homotopy methods if is pre-selected to be in . In most of the experiments performed, our method produces better solutions than other algorithms that also apply smoothing techniques.

Paper Structure

This paper contains 24 sections, 6 theorems, 40 equations, 4 figures, 8 tables, 3 algorithms.

Key Result

Theorem 1

Let $f:\mathcal{S}\subset\mathbb{R}^d\rightarrow \mathbb{R}$ be a continuous function that is possibly non-concave (and non-negative only for the case of PGS), where $\mathcal{S}$ is compact. Assume that $f$ has a global maximum $\bm{x}^*$ such that $\sup_{\bm{x}: \lVert \bm{x} - \bm{x}^*\rVert \geq where $f_N$ is defined in (fN) for either PGS or EPGS. Then, for any $M>0$ and $\delta>0$ such that

Figures (4)

  • Figure 1: Effects of elevating the objective $f$ before Gaussian smoothing (A toy example): The maximum point of $F_N(\mu):=\mathbb{E}_{\xi\sim\mathcal{N}(0,1)}[f_N(\mu+\sigma\xi)]$ gets closer to the global maximum point $f$ as $N$ increases, where $\sigma=0.5$ and $f(\mu)=-\log((\mu+0.5)^2+10^{-5})-\log((\mu+0.5)^2+10^{-2})+10$ for $|\mu|\leq 1$ and $f(\mu)=0$ for $|\mu|>1.$ For easier comparison, the graph of each function is scaled to have a maximum value of 1.
  • Figure 2: $f(\bm{x}) = -\log(\|\bm{x}-\bm{m}_1\|^2+10^{-5}) - \log(\|\bm{x}-\bm{m}_2\|^2+10^{-2})$
  • Figure 3: Effects of Increasing $N$. For each $N$, we perform the algorithm 100 times and obtain $\{\bm{\mu}_k\}_{k=1}^{100}$. The average fitness $\sum_{k=1}^{100} f(\bm{\mu}_k)/100$ and $\sum_{k=1}^{100} \text{MSE}(\bm{m}_1, \bm{\mu}_k)/100$ are plotted, where MSE$(\bm{m}_1, \bm{\mu}_k):=\sum_{i=1}^d(\mu_{ki}+0.5)^2/d$, $\sigma=1.0$, and $f$ is defined in \ref{['objective2log']}. Note that $\bm{x}^*=\bm{m}_1$ has all its entries equal to $-0.5$.
  • Figure 4: Graph of objective functions.

Theorems & Definitions (14)

  • Theorem 1
  • proof
  • Remark 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 2
  • ...and 4 more