Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Eshed Gal; Samy Wu Fung; Eldad Haber

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Eshed Gal, Samy Wu Fung, Eldad Haber

Abstract

We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $λ>0$ and recovers the original objective as $λ\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Abstract

and recovers the original objective as

, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

Paper Structure (36 sections, 2 theorems, 43 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 36 sections, 2 theorems, 43 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Contributions.
Background
Gaussian Homotopy in Objective Space
Moreau Envelope and Soft-Min Smoothing
Score-Based Diffusion and Gaussian Smoothing
Probabilistic Gaussian Homotopy
Probability-Space Homotopy Construction
Gradient Structure: Soft-Min Aggregation
Continuation Dynamics
Stochastic Approximation and Discrete Scheme
Computational Complexity.
Connection to Moreau Envelopes and Bayesian Interpretation
Soft Moreau Envelope
Bayesian Interpretation and Posterior Mean Dynamics
...and 21 more sections

Key Result

theorem 1

Let $F_t(x)$ denote the PGH energy defined in equation eq:Ft_definition. If $\alpha(t)=1$ and $\beta(t)=\sqrt{\lambda(t)}$, then where $C(t)$ is a constant independent of $x$.

Figures (5)

Figure 1: Gaussian homotopy for the function $f(x_1,x_2) = \log(Rosen(x_1,x_2) + \alpha (1 + \sin(3 x_1)\sin(3x_2)))$, where $Rosen$ denotes the banana Rosenbrock function. We estimate the homotopy-smoothed objective by Monte Carlo sampling. The surface begins nearly flat and gradually deforms into a highly multimodal landscape that is much harder to optimize.
Figure 2: Ackley and Griewank plotted for the 2D case with the trajectories obtained by the PGH algorithms starting from different points
Figure 3: Success rate vs. dimension on Ackley. PGH maintains high success rate across all dimensions, while competing methods degrade significantly on higher dimensions. Budget fixed at $10^5$ function evaluations, averaged over 30 runs.
Figure 4: Sparse recovery results along the regularization path. Left: tradeoff between reconstruction fidelity and sparsity. Right: attained smooth objective value as a function of the regularization parameter. In both plots, the PGH variants improve over the corresponding GD and Adam baselines.
Figure 5: Convergence of PGH-Adam on four benchmark functions ($d=10$). Solid line shows the median best objective value over $20$ runs; shaded region shows the interquartile range; dashed line marks the success threshold ($5 \times 10^{-2}$).

Theorems & Definitions (4)

theorem 1: PGH as Soft Moreau Envelope
proof
theorem 2: Posterior Mean Identity
proof

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Abstract

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)