Diffusion at Absolute Zero: Langevin Sampling Using Successive Moreau Envelopes [conference paper]
Andreas Habring, Alexander Falk, Thomas Pock
TL;DR
The paper tackles the challenge of sampling from Gibbs distributions π(x) ∝ exp(-U(x)) where U = F + G may be nonconvex or nondifferentiable. It introduces Diffusion at Absolute Zero (DAZ), a method that builds a sequence of tempered densities π^t with U^t(x) = F(x) + M_G^t(x), where M_G^t is the Moreau envelope of G, and uses Annealed Langevin sampling across decreasing t to gradually evolve samples toward the target. A key theoretical contribution is showing the consistency of the scheme: π^t is well-defined, Lipschitz in total variation, and converges to π as t → 0 under mild assumptions, with the gradient of the Moreau envelope given by ∇M_G^t(x) = (1/t)(x - prox_{tG}(x)). The paper also links DAZ to denoising score matching by interpreting it as a zero-temperature limit of diffusion with a temperature parameter T, connecting non-smooth optimization techniques with diffusion-model ideas. Empirically, DAZ achieves faster convergence and better multi-modal coverage than competing methods in 1D and high-dimensional TV-L2 denoising tasks, including chains and images, without requiring training of a score model.
Abstract
In this article we propose a novel method for sampling from Gibbs distributions of the form $π(x)\propto\exp(-U(x))$ with a potential $U(x)$. In particular, inspired by diffusion models we propose to consider a sequence $(π^{t_k})_k$ of approximations of the target density, for which $π^{t_k}\approx π$ for $k$ small and, on the other hand, $π^{t_k}$ exhibits favorable properties for sampling for $k$ large. This sequence is obtained by replacing parts of the potential $U$ by its Moreau envelopes. Sampling is performed in an Annealed Langevin type procedure, that is, sequentially sampling from $π^{t_k}$ for decreasing $k$, effectively guiding the samples from a simple starting density to the more complex target. In addition to a theoretical analysis we show experimental results supporting the efficacy of the method in terms of increased convergence speed and applicability to multi-modal densities $π$.
