LatentCRF: Continuous CRF for Efficient Latent Diffusion
Kanchana Ranasinghe, Sadeep Jayasumana, Andreas Veit, Ayan Chakrabarti, Daniel Glasner, Michael S Ryoo, Srikumar Ramalingam, Sanjiv Kumar
TL;DR
LatentCRF presents a continuous CRF layer that operates in the latent space of Latent Diffusion Models to accelerate inference by replacing several U-Net iterations with a lightweight, trainable inference step. The model incorporates unary, pairwise, and higher-order energies, including a Field-of-Experts prior, and uses differentiable mean-field updates with learned conditioning to capture spatial and semantic consistencies. Training combines a latent denoising loss with a latent-space adversarial objective, and the approach can be paired with distillation from LDM to mimic later steps, achieving a 33% speedup with negligible losses in image quality and diversity. Compared to prior distillation and compression methods, LatentCRF better preserves diversity while maintaining high visual fidelity, and it requires no modification to the base LDM, making it a practical add-on for accelerating diffusion-based image generation.
Abstract
Latent Diffusion Models (LDMs) produce high-quality, photo-realistic images, however, the latency incurred by multiple costly inference iterations can restrict their applicability. We introduce LatentCRF, a continuous Conditional Random Field (CRF) model, implemented as a neural network layer, that models the spatial and semantic relationships among the latent vectors in the LDM. By replacing some of the computationally-intensive LDM inference iterations with our lightweight LatentCRF, we achieve a superior balance between quality, speed and diversity. We increase inference efficiency by 33% with no loss in image quality or diversity compared to the full LDM. LatentCRF is an easy add-on, which does not require modifying the LDM.
