Fixed Point Diffusion Models
Xingjian Bai, Luke Melas-Kyriazi
TL;DR
The paper tackles the inefficiency of diffusion-based image generation by introducing Fixed Point Diffusion Models (FPDM), which embed an implicit fixed-point denoising layer into a diffusion network and operate in latent space. Training leverages Stochastic Jacobian-Free Backpropagation to backprop through a sequence of fixed-point solutions across timesteps, enabling substantial reductions in parameter count and memory while maintaining or improving sampling quality under constrained compute. FPDM introduces two key sampling techniques—timestep smoothing and solution reuse—allowing flexible allocation of compute across timesteps and accelerating convergence. Across datasets including ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, FPDM achieves up to 87% fewer parameters and 60% less training memory than DiT, with superior performance when sampling time or compute is limited, highlighting practical impact for resource-constrained generation tasks. The work also outlines limitations and promising directions, such as scaling to larger datasets and exploring adaptive allocation policies to further exploit the fixed-point framework.
Abstract
We introduce the Fixed Point Diffusion Model (FPDM), a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling. Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems. Combined with a new stochastic training method, this approach significantly reduces model size, reduces memory usage, and accelerates training. Moreover, it enables the development of two new techniques to improve sampling efficiency: reallocating computation across timesteps and reusing fixed point solutions between timesteps. We conduct extensive experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency. Compared to the state-of-the-art DiT model, FPDM contains 87% fewer parameters, consumes 60% less memory during training, and improves image generation quality in situations where sampling computation or time is limited. Our code and pretrained models are available at https://lukemelas.github.io/fixed-point-diffusion-models.
