Table of Contents
Fetching ...

DREAM: Diffusion Rectification and Estimation-Adaptive Models

Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang

TL;DR

DREAM is a novel training framework representing Diffusion Rectification and Estimation-Adaptive Models, requiring minimal code changes yet significantly enhancing the alignment of training with sampling in diffusion models, and hopes it will inspire a rethinking of diffusion model training paradigms.

Abstract

We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which balances perception against distortion. When applied to image super-resolution (SR), DREAM adeptly navigates the tradeoff between minimizing distortion and preserving high image quality. Experiments demonstrate DREAM's superiority over standard diffusion-based SR methods, showing a $2$ to $3\times $ faster training convergence and a $10$ to $20\times$ reduction in sampling steps to achieve comparable results. We hope DREAM will inspire a rethinking of diffusion model training paradigms.

DREAM: Diffusion Rectification and Estimation-Adaptive Models

TL;DR

DREAM is a novel training framework representing Diffusion Rectification and Estimation-Adaptive Models, requiring minimal code changes yet significantly enhancing the alignment of training with sampling in diffusion models, and hopes it will inspire a rethinking of diffusion model training paradigms.

Abstract

We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which balances perception against distortion. When applied to image super-resolution (SR), DREAM adeptly navigates the tradeoff between minimizing distortion and preserving high image quality. Experiments demonstrate DREAM's superiority over standard diffusion-based SR methods, showing a to faster training convergence and a to reduction in sampling steps to achieve comparable results. We hope DREAM will inspire a rethinking of diffusion model training paradigms.
Paper Structure (20 sections, 13 equations, 22 figures, 5 tables, 3 algorithms)

This paper contains 20 sections, 13 equations, 22 figures, 5 tables, 3 algorithms.

Figures (22)

  • Figure 1: Comparative training of conditional diffusion models for super-resolution. Top: standard conditional DDPM saharia2022image. Bottom: enhancing the same model training with just three additional lines of code, leaving the sampling process unchanged. DREAM facilitates notably faster and more stable training convergence, significantly surpassing baseline models in key metrics of perception and distortion.
  • Figure 2: Overview of the DREAM framework. Starting with ground-truth HR images, a standard diffusion process with a frozen denoiser network generates denoised HR estimates. The Adaptive Estimation merges these estimated HR images with the original HR images, guided by the pattern of estimation errors. The Diffusion Rectification constructs the noisy images from this merged HR images, which are then fed into the denoiser network (now unfrozen). Similar to DDPM ho2020denoising, the denoiser network is trained to eliminate both the introduced Gaussian noise and errors arising from the training-sampling discrepancy, as detailed in \ref{['eq:dream-objective']}.
  • Figure 3: Evaluation of training-sampling discrepancy and its alleviation through our DREAM framework. The mean curve over 100 samples at each time step $t$ is plotted, with the shaded area representing the standard deviation of each metric. Here, $T=2000$.
  • Figure 4: $8\times$ SR on the CelebA-HQ dataset karras2017progressive.
  • Figure 5: Qualitative comparison for $8\times$ SR using IDM gao2023implicit on the CelebA-HQ dataset karras2017progressive. Results highlight DREAM's superior fidelity and enhanced identity preservation, leading to more realistic detail generation in features like hair, eyes, and rings.
  • ...and 17 more figures