Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection
Kaikwan Lau, Andrew S. Na, Justin W. L. Wan
TL;DR
This work tackles the high computational cost of training and sampling in score-based diffusion models by introducing two complementary techniques. First, it pre-computes accurate score information by solving the log-density Fokker-Planck equation and embeds these scores into images through a transport equation, enabling faster and more stable training via score embedding. Second, it introduces a timestep-wise cross-matrix Krylov projection strategy that builds seed Krylov subspaces from a seed image and transfers them to subsequent, similar images, dramatically reducing the solve time for multiple large linear systems. Across CIFAR-10 and CelebA datasets, the approach delivers substantial speedups—up to 115× over DDPM baselines in single-image denoising and up to 43.7% time reductions versus SpSolve—while maintaining or improving image quality under fixed computational budgets. Together, these methods bridge numerical linear algebra with diffusion-based generative modeling to enable efficient, high-quality image generation on resource-constrained hardware.
Abstract
This paper presents a novel framework to accelerate score-based diffusion models. It first converts the standard stable diffusion model into the Fokker-Planck formulation which results in solving large linear systems for each image. For training involving many images, it can lead to a high computational cost. The core innovation is a cross-matrix Krylov projection method that exploits mathematical similarities between matrices, using a shared subspace built from ``seed" matrices to rapidly solve for subsequent ``target" matrices. Our experiments show that this technique achieves a 15.8\% to 43.7\% time reduction over standard sparse solvers. Additionally, we compare our method against DDPM baselines in denoising tasks, showing a speedup of up to 115$\times$. Furthermore, under a fixed computational budget, our model is able to produce high-quality images while DDPM fails to generate recognizable content, illustrating our approach is a practical method for efficient generation in resource-limited settings.
