Table of Contents
Fetching ...

Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection

Kaikwan Lau, Andrew S. Na, Justin W. L. Wan

TL;DR

This work tackles the high computational cost of training and sampling in score-based diffusion models by introducing two complementary techniques. First, it pre-computes accurate score information by solving the log-density Fokker-Planck equation and embeds these scores into images through a transport equation, enabling faster and more stable training via score embedding. Second, it introduces a timestep-wise cross-matrix Krylov projection strategy that builds seed Krylov subspaces from a seed image and transfers them to subsequent, similar images, dramatically reducing the solve time for multiple large linear systems. Across CIFAR-10 and CelebA datasets, the approach delivers substantial speedups—up to 115× over DDPM baselines in single-image denoising and up to 43.7% time reductions versus SpSolve—while maintaining or improving image quality under fixed computational budgets. Together, these methods bridge numerical linear algebra with diffusion-based generative modeling to enable efficient, high-quality image generation on resource-constrained hardware.

Abstract

This paper presents a novel framework to accelerate score-based diffusion models. It first converts the standard stable diffusion model into the Fokker-Planck formulation which results in solving large linear systems for each image. For training involving many images, it can lead to a high computational cost. The core innovation is a cross-matrix Krylov projection method that exploits mathematical similarities between matrices, using a shared subspace built from ``seed" matrices to rapidly solve for subsequent ``target" matrices. Our experiments show that this technique achieves a 15.8\% to 43.7\% time reduction over standard sparse solvers. Additionally, we compare our method against DDPM baselines in denoising tasks, showing a speedup of up to 115$\times$. Furthermore, under a fixed computational budget, our model is able to produce high-quality images while DDPM fails to generate recognizable content, illustrating our approach is a practical method for efficient generation in resource-limited settings.

Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection

TL;DR

This work tackles the high computational cost of training and sampling in score-based diffusion models by introducing two complementary techniques. First, it pre-computes accurate score information by solving the log-density Fokker-Planck equation and embeds these scores into images through a transport equation, enabling faster and more stable training via score embedding. Second, it introduces a timestep-wise cross-matrix Krylov projection strategy that builds seed Krylov subspaces from a seed image and transfers them to subsequent, similar images, dramatically reducing the solve time for multiple large linear systems. Across CIFAR-10 and CelebA datasets, the approach delivers substantial speedups—up to 115× over DDPM baselines in single-image denoising and up to 43.7% time reductions versus SpSolve—while maintaining or improving image quality under fixed computational budgets. Together, these methods bridge numerical linear algebra with diffusion-based generative modeling to enable efficient, high-quality image generation on resource-constrained hardware.

Abstract

This paper presents a novel framework to accelerate score-based diffusion models. It first converts the standard stable diffusion model into the Fokker-Planck formulation which results in solving large linear systems for each image. For training involving many images, it can lead to a high computational cost. The core innovation is a cross-matrix Krylov projection method that exploits mathematical similarities between matrices, using a shared subspace built from ``seed" matrices to rapidly solve for subsequent ``target" matrices. Our experiments show that this technique achieves a 15.8\% to 43.7\% time reduction over standard sparse solvers. Additionally, we compare our method against DDPM baselines in denoising tasks, showing a speedup of up to 115. Furthermore, under a fixed computational budget, our model is able to produce high-quality images while DDPM fails to generate recognizable content, illustrating our approach is a practical method for efficient generation in resource-limited settings.

Paper Structure

This paper contains 24 sections, 54 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Conceptual diagram of our time-wise projection method. We consider all timesteps in the matrix $A^n_i$ as a seed; for each timestep, we use the projection method.
  • Figure 2: Demonstration of pre-computed Score Method for Denoising 32×32 (top), 64×64 (middle) and 128×128 (bottom).
  • Figure 3: Demonstration of generation of 128×128 celebrity images from CelebA. We sample 6 timesteps during the sampling to demonstrate the generating process.
  • Figure 4: A comparison of 32×32 images generated after an identical training time budget (4155.17s). Our method (left) produces coherent images with an average SSIM of 0.8991, whereas DDPM (right) achieves truck images with an average SSIM of only 0.8517 within the same time budget.
  • Figure 5: A comparison of 64×64 images generated after an identical training time budget (6388.59s). Our method (left) produces coherent images with an average SSIM of 0.6302. In contrast, the standard DDPM (right) only produces a few recognizable images with an average SSIM of 0.4446 within the same time constraint.
  • ...and 4 more figures