Table of Contents
Fetching ...

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Eshed Gal, Samy Wu Fung, Eldad Haber

Abstract

We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $λ>0$ and recovers the original objective as $λ\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Abstract

We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale and recovers the original objective as , yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.
Paper Structure (36 sections, 2 theorems, 43 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 36 sections, 2 theorems, 43 equations, 5 figures, 1 table, 1 algorithm.

Key Result

theorem 1

Let $F_t(x)$ denote the PGH energy defined in equation eq:Ft_definition. If $\alpha(t)=1$ and $\beta(t)=\sqrt{\lambda(t)}$, then where $C(t)$ is a constant independent of $x$.

Figures (5)

  • Figure 1: Gaussian homotopy for the function $f(x_1,x_2) = \log(Rosen(x_1,x_2) + \alpha (1 + \sin(3 x_1)\sin(3x_2)))$, where $Rosen$ denotes the banana Rosenbrock function. We estimate the homotopy-smoothed objective by Monte Carlo sampling. The surface begins nearly flat and gradually deforms into a highly multimodal landscape that is much harder to optimize.
  • Figure 2: Ackley and Griewank plotted for the 2D case with the trajectories obtained by the PGH algorithms starting from different points
  • Figure 3: Success rate vs. dimension on Ackley. PGH maintains high success rate across all dimensions, while competing methods degrade significantly on higher dimensions. Budget fixed at $10^5$ function evaluations, averaged over 30 runs.
  • Figure 4: Sparse recovery results along the regularization path. Left: tradeoff between reconstruction fidelity and sparsity. Right: attained smooth objective value as a function of the regularization parameter. In both plots, the PGH variants improve over the corresponding GD and Adam baselines.
  • Figure 5: Convergence of PGH-Adam on four benchmark functions ($d=10$). Solid line shows the median best objective value over $20$ runs; shaded region shows the interquartile range; dashed line marks the success threshold ($5 \times 10^{-2}$).

Theorems & Definitions (4)

  • theorem 1: PGH as Soft Moreau Envelope
  • proof
  • theorem 2: Posterior Mean Identity
  • proof