Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion Models

Jingjing Wang; Dan Zhang; Feng Luo

Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion Models

Jingjing Wang, Dan Zhang, Feng Luo

TL;DR

This work introduces uDDDM, a unified directly denoising diffusion framework that handles both VP and VE forward SDEs and supports one-step or multi-step sampling. It derives a unified probability-flow ODE formulation with a predictor $\mathbf{f}_{\theta}$ that combines current state and a learned correction, and it couples this with an adaptive Pseudo-Huber loss to balance convergence accuracy and stability. The authors prove key theoretical properties, including existence and uniqueness of solution paths and non-intersecting sampling trajectories, and demonstrate competitive CIFAR-10 results with state-of-the-art performance at 1000-step sampling in both VP and VE. They also discuss memory costs and limitations, noting VE challenges and proposing directions for improved hyperparameters and noise schedulers to further enhance robustness and scalability.

Abstract

Previous work has demonstrated that, in the Variance Preserving (VP) scenario, the nascent Directly Denoising Diffusion Models (DDDM) can generate high-quality images in one step while achieving even better performance in multistep sampling. However, the Pseudo-LPIPS loss used in DDDM leads to concerns about the bias in assessment. Here, we propose a unified DDDM (uDDDM) framework that generates images in one-step/multiple steps for both Variance Preserving (VP) and Variance Exploding (VE) cases. We provide theoretical proofs of the existence and uniqueness of the model's solution paths, as well as the non-intersecting property of the sampling paths. Additionally, we propose an adaptive Pseudo-Huber loss function to balance the convergence to the true solution and the stability of convergence process.Through a comprehensive evaluation, we demonstrate that uDDDMs achieve FID scores comparable to the best-performing methods available for CIFAR-10 in both VP and VE. Specifically, uDDDM achieves one-step generation on CIFAR10 with FID of 2.63 and 2.53 for VE and VP respectively. By extending the sampling to 1000 steps, we further reduce FID score to 1.71 and 1.65 for VE and VP respectively, setting state-of-the-art performance in both cases.

Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion Models

TL;DR

that combines current state and a learned correction, and it couples this with an adaptive Pseudo-Huber loss to balance convergence accuracy and stability. The authors prove key theoretical properties, including existence and uniqueness of solution paths and non-intersecting sampling trajectories, and demonstrate competitive CIFAR-10 results with state-of-the-art performance at 1000-step sampling in both VP and VE. They also discuss memory costs and limitations, noting VE challenges and proposing directions for improved hyperparameters and noise schedulers to further enhance robustness and scalability.

Abstract

Paper Structure (15 sections, 52 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 15 sections, 52 equations, 6 figures, 1 table, 2 algorithms.

Introduction
Preliminaries
Variance Exploding (VE) Stochastic Differential Equation (SDE)
The framework of DDDM
A Unified DDDM (uDDDM) Framework
The Unified DDDMs
The Adaptive Loss Function
Theoretical Justifications of uDDDM
Training and Sampling with uDDDM
Experiments
Implementation Details
Comparison to SOTA
Related Works
Discussion and Limitations
Appendix / supplemental material

Figures (6)

Figure 1: One-step samples from uDDDM-VE-deep
Figure 2: One-step samples from uDDDM-VP-deep
Figure 3: Uncurated samples from the uDDDM-VE model. All corresponding samples use the same initial noise.
Figure 4: Uncurated samples from the uDDDM-VE-deep model. All corresponding samples use the same initial noise.
Figure 5: Uncurated samples from the uDDDM-VP model. All corresponding samples use the same initial noise.
...and 1 more figures

Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion Models

TL;DR

Abstract

Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)