Diffusion at Absolute Zero: Langevin Sampling using Successive Moreau Envelopes [journal paper]
Andreas Habring, Alexander Falk, Martin Zach, Thomas Pock
TL;DR
This paper introduces Diffusion at Absolute Zero (DAZ), a sampling framework for Gibbs distributions π(x) ∝ exp(-U(x)) where U may be nonconvex or nondifferentiable. It constructs a family π^t by replacing the difficult part G with its Moreau envelope M_G^t, enabling efficient Langevin updates with large step sizes and guiding samples toward the target as t ↓ 0. The authors establish ergodicity and consistency results for the inner Langevin scheme at fixed t, show that π^t depends Lipschitz-continuously on t, and connect the method to diffusion models via a zero-temperature limit. Extensive numerical experiments across toy problems and high-dimensional imaging tasks demonstrate faster convergence and improved mode-coverage compared with standard Langevin-based methods, with notable gains in nonconvex or non-strongly convex settings. Overall, DAZ offers a principled, training-free approach to aggressive annealing in nondifferentiable/nonconvex contexts and broad applicability to inverse problems and learned priors.
Abstract
We propose a method for sampling from Gibbs distributions of the form $π(x)\propto\exp(-U(x))$ by considering a family $(π^{t})_t$ of approximations of the target density which is such that $π^{t}$ exhibits favorable properties for sampling when $t$ is large, and $π^{t} \to π$ as $t \to 0$. This sequence is obtained by replacing (parts of) the potential $U$ by its Moreau envelope. Through the sequential sampling from $π^{t}$ for decreasing values of $t$ by a Langevin algorithm with appropriate step size, the samples are guided from a simple starting density to the more complex target quickly. We prove the ergodicity of the method as well as its convergence to the target density without assuming convexity or differentiability of the potential $U$. In addition to the theoretical analysis, we show experimental results that support the superiority of the method in terms of convergence speed and mode-coverage of multi-modal densities to current algorithms. The experiments range from one-dimensional toy-problems to high-dimensional inverse imaging problems with learned potentials.
