Table of Contents
Fetching ...

Diffusion at Absolute Zero: Langevin Sampling using Successive Moreau Envelopes [journal paper]

Andreas Habring, Alexander Falk, Martin Zach, Thomas Pock

TL;DR

This paper introduces Diffusion at Absolute Zero (DAZ), a sampling framework for Gibbs distributions π(x) ∝ exp(-U(x)) where U may be nonconvex or nondifferentiable. It constructs a family π^t by replacing the difficult part G with its Moreau envelope M_G^t, enabling efficient Langevin updates with large step sizes and guiding samples toward the target as t ↓ 0. The authors establish ergodicity and consistency results for the inner Langevin scheme at fixed t, show that π^t depends Lipschitz-continuously on t, and connect the method to diffusion models via a zero-temperature limit. Extensive numerical experiments across toy problems and high-dimensional imaging tasks demonstrate faster convergence and improved mode-coverage compared with standard Langevin-based methods, with notable gains in nonconvex or non-strongly convex settings. Overall, DAZ offers a principled, training-free approach to aggressive annealing in nondifferentiable/nonconvex contexts and broad applicability to inverse problems and learned priors.

Abstract

We propose a method for sampling from Gibbs distributions of the form $π(x)\propto\exp(-U(x))$ by considering a family $(π^{t})_t$ of approximations of the target density which is such that $π^{t}$ exhibits favorable properties for sampling when $t$ is large, and $π^{t} \to π$ as $t \to 0$. This sequence is obtained by replacing (parts of) the potential $U$ by its Moreau envelope. Through the sequential sampling from $π^{t}$ for decreasing values of $t$ by a Langevin algorithm with appropriate step size, the samples are guided from a simple starting density to the more complex target quickly. We prove the ergodicity of the method as well as its convergence to the target density without assuming convexity or differentiability of the potential $U$. In addition to the theoretical analysis, we show experimental results that support the superiority of the method in terms of convergence speed and mode-coverage of multi-modal densities to current algorithms. The experiments range from one-dimensional toy-problems to high-dimensional inverse imaging problems with learned potentials.

Diffusion at Absolute Zero: Langevin Sampling using Successive Moreau Envelopes [journal paper]

TL;DR

This paper introduces Diffusion at Absolute Zero (DAZ), a sampling framework for Gibbs distributions π(x) ∝ exp(-U(x)) where U may be nonconvex or nondifferentiable. It constructs a family π^t by replacing the difficult part G with its Moreau envelope M_G^t, enabling efficient Langevin updates with large step sizes and guiding samples toward the target as t ↓ 0. The authors establish ergodicity and consistency results for the inner Langevin scheme at fixed t, show that π^t depends Lipschitz-continuously on t, and connect the method to diffusion models via a zero-temperature limit. Extensive numerical experiments across toy problems and high-dimensional imaging tasks demonstrate faster convergence and improved mode-coverage compared with standard Langevin-based methods, with notable gains in nonconvex or non-strongly convex settings. Overall, DAZ offers a principled, training-free approach to aggressive annealing in nondifferentiable/nonconvex contexts and broad applicability to inverse problems and learned priors.

Abstract

We propose a method for sampling from Gibbs distributions of the form by considering a family of approximations of the target density which is such that exhibits favorable properties for sampling when is large, and as . This sequence is obtained by replacing (parts of) the potential by its Moreau envelope. Through the sequential sampling from for decreasing values of by a Langevin algorithm with appropriate step size, the samples are guided from a simple starting density to the more complex target quickly. We prove the ergodicity of the method as well as its convergence to the target density without assuming convexity or differentiability of the potential . In addition to the theoretical analysis, we show experimental results that support the superiority of the method in terms of convergence speed and mode-coverage of multi-modal densities to current algorithms. The experiments range from one-dimensional toy-problems to high-dimensional inverse imaging problems with learned potentials.

Paper Structure

This paper contains 36 sections, 20 theorems, 91 equations, 11 figures, 2 algorithms.

Key Result

Lemma 4.1

Define the nonconvexity of a function $H:\mathbb{R}^d\rightarrow \mathbb{R}$ as the (possibly infinite) number Assume $\mathop{\mathrm{prox}}\nolimits_{tH}(x)$ is nonempty for all $x \in \mathbb{R}^d$. Then, $\mathop{\mathrm{NC}}\nolimits(M^t_H)\leq \mathop{\mathrm{NC}}\nolimits(H)$.

Figures (11)

  • Figure 1: Moreau envelopes of a Gaussian mixture for a sequence of Moreau parameters $t \in [1.0e-2,1.0e2]$. Note how the Moreau envelope convexifies the potential with increasing $t$.
  • Figure 1: distance between the sample distribution and the target Gaussian mixture. Left: Initializing the chains with a standard normal distribution. Right: Initializing with a Dirac distribution concentrated at zero. GT denotes the error of a random ground truth sample of the same size as the number of simulated chains. converges fastest with additional acceleration by combining it with .
  • Figure 2: distance between the target $\pi(x)\propto \exp(-G(x))$ with $G(x) = |x|$ and the Moreau envelope based potentials $\pi^t(x)\propto \exp(-M^t_G(x))$ as well as the corresponding stationary distributions of the chain $\pi^t_\tau$ for step sizes $\tau\in\{0.001,0.1,0.5\}$. We clearly observe the proven Lipschitz continuity of $t\mapsto\pi^t$ with $\pi^t\rightarrow \pi$ as $t\rightarrow 0$.
  • Figure 2: prior sampling. distance between three finite difference marginals and the known ground truth. and converge fastest, again, with additional acceleration by adding in the inner loop of . The good performance of might be due to convexity of the potential.
  • Figure 3: difference marginals. The closer a method resembles the absolute value function, the better. We find that the samples obtained with , , and - approximate the target significantly more accurately.
  • ...and 6 more figures

Theorems & Definitions (49)

  • Definition 3.1: Regular subdifferential
  • Definition 3.2: Moreau envelope
  • Definition 3.3: Proximal map
  • Lemma 4.1
  • Proof 1
  • Remark 4.2
  • Remark 4.4
  • Theorem 4.5
  • Lemma 4.6
  • Lemma 4.7
  • ...and 39 more