Table of Contents
Fetching ...

Denoising Diffusion Gamma Models

Eliya Nachmani, Robin San Roman, Lior Wolf

TL;DR

The paper introduces Denoising Diffusion Gamma Models (DDGM), a diffusion framework that replaces Gaussian noise with Gamma noise while preserving closed-form state sampling. By deriving the Gamma forward and reverse processes and a variational lower bound, the authors obtain a tractable training objective based on an L1 loss between predicted and true gamma-centered residuals. Empirical results in image and speech generation show that Gamma diffusion improves metrics such as FID, PESQ, and STOI compared to Gaussian baselines, especially at lower iteration counts, and approaches Gaussian DDPM performance at higher counts. Overall, DDGM demonstrates that non-Gaussian noise can enhance diffusion-based generative modeling without sacrificing sampling efficiency, with practical benefits for both vision and audio synthesis.

Abstract

Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.

Denoising Diffusion Gamma Models

TL;DR

The paper introduces Denoising Diffusion Gamma Models (DDGM), a diffusion framework that replaces Gaussian noise with Gamma noise while preserving closed-form state sampling. By deriving the Gamma forward and reverse processes and a variational lower bound, the authors obtain a tractable training objective based on an L1 loss between predicted and true gamma-centered residuals. Empirical results in image and speech generation show that Gamma diffusion improves metrics such as FID, PESQ, and STOI compared to Gaussian baselines, especially at lower iteration counts, and approaches Gaussian DDPM performance at higher counts. Overall, DDGM demonstrates that non-Gaussian noise can enhance diffusion-based generative modeling without sacrificing sampling efficiency, with practical benefits for both vision and audio synthesis.

Abstract

Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.

Paper Structure

This paper contains 16 sections, 4 theorems, 39 equations, 4 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

Let $\theta_0 \in \mathbb{R}$, Assuming $\forall t \in \{1,..., T\}$, $k_t=\dfrac{\beta_t}{\alpha_t{\theta_0}^2}$, $\theta_t = \sqrt{\bar{\alpha}_t}\theta_0$, and $g_t\sim \Gamma(k_t, \theta_t)$. Then $\forall t \in \{1,..., T\}$ the following hold: where $\bar{g}_t \sim \Gamma(\bar{k}_t, \theta_t)$ and $\bar{k}_t = \sum_{i=1}^t k_i$

Figures (4)

  • Figure 1: Fitting a distribution to the histogram of the generation error, which given by the scaled difference between $x_0$ and the image $x_t$ after $t$ DDPM steps $\hat{\epsilon}=\frac{\sqrt{\bar{\alpha}_t}x_0 - x_t}{\sqrt{1 - |\bar{\alpha}_t|}}$. The model is a pretrained DDPM (Gaussian) celebA (64x64) model. (a) The fitting of a Gaussian to the histogram of a typical image after $t-50$ steps. (b) Fitting a Gamma distribution. (c) The fitting error to Gaussian and Gamma distribution, measured as the MSE between the histogram and the fitted probability distribution function. Each point is the average value for the generation of $100$ images. The vertical error bars denote the standard deviation.
  • Figure 2: Typical examples of images generated with $100$ iterations and $\eta=0$. For models trained with different noise distributions - (i) First row - Gaussian noise and (ii) Second row - Gamma noise. All models start from the same noise instance.
  • Figure : DDPM training procedure.
  • Figure : Gamma Training Algorithm

Theorems & Definitions (6)

  • Lemma 1
  • Lemma 2
  • Lemma 2
  • proof
  • Lemma 2
  • proof