Table of Contents
Fetching ...

Efficient Denoising using Score Embedding in Score-based Diffusion Models

Andrew S. Na, William Gao, Justin W. L. Wan

TL;DR

This work tackles the heavy training cost of denoising score-based diffusion models by pre-computing the score field through a numerical solution of the log-density Fokker-Planck equation, $m = \log p$, before training. The computed score is embedded into the image via the transport equation, providing a label-embedded input that guides learning under a slice Wasserstein objective, thereby reducing both epoch counts and the required amount of training data. A semi-explicit finite-difference scheme handles the nonlinearity of the log-density FP equation, with sparse Gaussian elimination accelerating the solve, and the approach is validated on CIFAR10, CelebA, and ImageNet, showing 3–5x training-time speedups while preserving image quality. The method offers a practical path to more energy-efficient and scalable diffusion-based denoising and generation, with future work extending the framework to videos and higher-dimensional densities.

Abstract

It is well known that training a denoising score-based diffusion models requires tens of thousands of epochs and a substantial number of image data to train the model. In this paper, we propose to increase the efficiency in training score-based diffusion models. Our method allows us to decrease the number of epochs needed to train the diffusion model. We accomplish this by solving the log-density Fokker-Planck (FP) Equation numerically to compute the score \textit{before} training. The pre-computed score is embedded into the image to encourage faster training under slice Wasserstein distance. Consequently, it also allows us to decrease the number of images we need to train the neural network to learn an accurate score. We demonstrate through our numerical experiments the improved performance of our proposed method compared to standard score-based diffusion models. Our proposed method achieves a similar quality to the standard method meaningfully faster.

Efficient Denoising using Score Embedding in Score-based Diffusion Models

TL;DR

This work tackles the heavy training cost of denoising score-based diffusion models by pre-computing the score field through a numerical solution of the log-density Fokker-Planck equation, , before training. The computed score is embedded into the image via the transport equation, providing a label-embedded input that guides learning under a slice Wasserstein objective, thereby reducing both epoch counts and the required amount of training data. A semi-explicit finite-difference scheme handles the nonlinearity of the log-density FP equation, with sparse Gaussian elimination accelerating the solve, and the approach is validated on CIFAR10, CelebA, and ImageNet, showing 3–5x training-time speedups while preserving image quality. The method offers a practical path to more energy-efficient and scalable diffusion-based denoising and generation, with future work extending the framework to videos and higher-dimensional densities.

Abstract

It is well known that training a denoising score-based diffusion models requires tens of thousands of epochs and a substantial number of image data to train the model. In this paper, we propose to increase the efficiency in training score-based diffusion models. Our method allows us to decrease the number of epochs needed to train the diffusion model. We accomplish this by solving the log-density Fokker-Planck (FP) Equation numerically to compute the score \textit{before} training. The pre-computed score is embedded into the image to encourage faster training under slice Wasserstein distance. Consequently, it also allows us to decrease the number of images we need to train the neural network to learn an accurate score. We demonstrate through our numerical experiments the improved performance of our proposed method compared to standard score-based diffusion models. Our proposed method achieves a similar quality to the standard method meaningfully faster.
Paper Structure (13 sections, 29 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 13 sections, 29 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: Illustration of training pipeline with score embedding.
  • Figure 2: Denoising an image of a puppy sampled from unconditional CIFAR10 using our proposed method (top), DDPM (middle) and DDIM (bottom). We sample $10$ timesteps during the sampling to demonstrate the denoising process.
  • Figure 3: Denoising an image of a female celebrity sampled from unconditional CelebA using our proposed method (top) and DDPM (bottom). We sample $10$ timesteps during the sampling to demonstrate the denoising process.
  • Figure 4: Denoising images of three dogs sampled from conditional CIFAR10 using our proposed method (top) and DDPM (bottom). We sample $10$ timesteps during the sampling to demonstrate the denoising process.
  • Figure 5: Plot of average SSIM and average MSE curves over training time/epoch.
  • ...and 2 more figures