Table of Contents
Fetching ...

Diffusion at Absolute Zero: Langevin Sampling Using Successive Moreau Envelopes [conference paper]

Andreas Habring, Alexander Falk, Thomas Pock

TL;DR

The paper tackles the challenge of sampling from Gibbs distributions π(x) ∝ exp(-U(x)) where U = F + G may be nonconvex or nondifferentiable. It introduces Diffusion at Absolute Zero (DAZ), a method that builds a sequence of tempered densities π^t with U^t(x) = F(x) + M_G^t(x), where M_G^t is the Moreau envelope of G, and uses Annealed Langevin sampling across decreasing t to gradually evolve samples toward the target. A key theoretical contribution is showing the consistency of the scheme: π^t is well-defined, Lipschitz in total variation, and converges to π as t → 0 under mild assumptions, with the gradient of the Moreau envelope given by ∇M_G^t(x) = (1/t)(x - prox_{tG}(x)). The paper also links DAZ to denoising score matching by interpreting it as a zero-temperature limit of diffusion with a temperature parameter T, connecting non-smooth optimization techniques with diffusion-model ideas. Empirically, DAZ achieves faster convergence and better multi-modal coverage than competing methods in 1D and high-dimensional TV-L2 denoising tasks, including chains and images, without requiring training of a score model.

Abstract

In this article we propose a novel method for sampling from Gibbs distributions of the form $π(x)\propto\exp(-U(x))$ with a potential $U(x)$. In particular, inspired by diffusion models we propose to consider a sequence $(π^{t_k})_k$ of approximations of the target density, for which $π^{t_k}\approx π$ for $k$ small and, on the other hand, $π^{t_k}$ exhibits favorable properties for sampling for $k$ large. This sequence is obtained by replacing parts of the potential $U$ by its Moreau envelopes. Sampling is performed in an Annealed Langevin type procedure, that is, sequentially sampling from $π^{t_k}$ for decreasing $k$, effectively guiding the samples from a simple starting density to the more complex target. In addition to a theoretical analysis we show experimental results supporting the efficacy of the method in terms of increased convergence speed and applicability to multi-modal densities $π$.

Diffusion at Absolute Zero: Langevin Sampling Using Successive Moreau Envelopes [conference paper]

TL;DR

The paper tackles the challenge of sampling from Gibbs distributions π(x) ∝ exp(-U(x)) where U = F + G may be nonconvex or nondifferentiable. It introduces Diffusion at Absolute Zero (DAZ), a method that builds a sequence of tempered densities π^t with U^t(x) = F(x) + M_G^t(x), where M_G^t is the Moreau envelope of G, and uses Annealed Langevin sampling across decreasing t to gradually evolve samples toward the target. A key theoretical contribution is showing the consistency of the scheme: π^t is well-defined, Lipschitz in total variation, and converges to π as t → 0 under mild assumptions, with the gradient of the Moreau envelope given by ∇M_G^t(x) = (1/t)(x - prox_{tG}(x)). The paper also links DAZ to denoising score matching by interpreting it as a zero-temperature limit of diffusion with a temperature parameter T, connecting non-smooth optimization techniques with diffusion-model ideas. Empirically, DAZ achieves faster convergence and better multi-modal coverage than competing methods in 1D and high-dimensional TV-L2 denoising tasks, including chains and images, without requiring training of a score model.

Abstract

In this article we propose a novel method for sampling from Gibbs distributions of the form with a potential . In particular, inspired by diffusion models we propose to consider a sequence of approximations of the target density, for which for small and, on the other hand, exhibits favorable properties for sampling for large. This sequence is obtained by replacing parts of the potential by its Moreau envelopes. Sampling is performed in an Annealed Langevin type procedure, that is, sequentially sampling from for decreasing , effectively guiding the samples from a simple starting density to the more complex target. In addition to a theoretical analysis we show experimental results supporting the efficacy of the method in terms of increased convergence speed and applicability to multi-modal densities .

Paper Structure

This paper contains 13 sections, 6 theorems, 12 equations, 5 figures, 1 algorithm.

Key Result

Lemma 1

The mapping $t\mapsto{\mathop{\mathrm{prox}}\nolimits}_{tG}(x)$ is continuous on $(0,t_{max})$ for any fixed $x$.

Figures (5)

  • Figure 1: Moreau envelopes of the potential for a Gaussian mixture for different Moreau parameters.
  • Figure 2: Sampling from a Laplace distribution and a bimodal Gaussian mixture: Comparison of the convergence speed of DAZ and MYULA durmus2018efficient with Moreau parameter $\lambda$ with respect to the total variation error.
  • Figure 3: Histogram of the obtained samples for MYULA and DAZ after 1000 iterations compared to the target density.
  • Figure 4: TV denoising on a chain.
  • Figure 5: TV denoising on images.

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Remark 1
  • Proposition 1
  • proof
  • Corollary 1
  • Proposition 2
  • proof
  • ...and 2 more