Pyramidal Denoising Diffusion Probabilistic Models
Dohoon Ryu, Jong Chul Ye
TL;DR
This work introduces Pyramidal Denoising Diffusion Probabilistic Models (PDDPM), a single-score diffusion framework conditioned on positional encodings that can generate high resolution images from coarse scales and perform multi-scale super-resolution. By training with scale aware coordinates and employing pyramidal reverse sampling along with CCDF acceleration, the approach achieves substantial speedups with a light network while maintaining image quality. Ablation studies validate the importance of positional encoding and patchwise training for very high resolution generation. Overall, PDDPM offers a practical, efficient pathway to fast diffusion based generation and high fidelity super-resolution using one model.
Abstract
Recently, diffusion model have demonstrated impressive image generation performances, and have been extensively studied in various computer vision tasks. Unfortunately, training and evaluating diffusion models consume a lot of time and computational resources. To address this problem, here we present a novel pyramidal diffusion model that can generate high resolution images starting from much coarser resolution images using a {\em single} score function trained with a positional embedding. This enables a neural network to be much lighter and also enables time-efficient image generation without compromising its performances. Furthermore, we show that the proposed approach can be also efficiently used for multi-scale super-resolution problem using a single score function.
