Denoising Diffusion Gamma Models
Eliya Nachmani, Robin San Roman, Lior Wolf
TL;DR
The paper introduces Denoising Diffusion Gamma Models (DDGM), a diffusion framework that replaces Gaussian noise with Gamma noise while preserving closed-form state sampling. By deriving the Gamma forward and reverse processes and a variational lower bound, the authors obtain a tractable training objective based on an L1 loss between predicted and true gamma-centered residuals. Empirical results in image and speech generation show that Gamma diffusion improves metrics such as FID, PESQ, and STOI compared to Gaussian baselines, especially at lower iteration counts, and approaches Gaussian DDPM performance at higher counts. Overall, DDGM demonstrates that non-Gaussian noise can enhance diffusion-based generative modeling without sacrificing sampling efficiency, with practical benefits for both vision and audio synthesis.
Abstract
Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.
