Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

Luan Thanh Trinh; Tomoki Hamagami

Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

Luan Thanh Trinh, Tomoki Hamagami

TL;DR

This work integrates latent-space diffusion with a pre-trained autoencoder to drastically speed up diffusion-based image generation while preserving quality and diversity. By removing Gaussian constraints in the latent space and introducing Weighted Learning, the model achieves fast inference (small number of denoising steps) and competitive fidelity on CIFAR-10, CelebA-HQ, and LSUN-Church, outperforming prior diffusion accelerations like DiffusionGAN and Wavelet Diffusion. The approach combines a VQGAN-based autoencoder, a GAN-based denoiser in latent space, and a carefully scheduled reconstruction loss to balance fidelity and diversity. The resulting Latent Denoising Diffusion GAN demonstrates state-of-the-art diffusion-speed with practical potential for real-time, high-fidelity image synthesis, and provides insights into latent-space design for diffusion models.

Abstract

Diffusion models are emerging as powerful solutions for generating high-fidelity and diverse images, often surpassing GANs under many circumstances. However, their slow inference speed hinders their potential for real-time applications. To address this, DiffusionGAN leveraged a conditional GAN to drastically reduce the denoising steps and speed up inference. Its advancement, Wavelet Diffusion, further accelerated the process by converting data into wavelet space, thus enhancing efficiency. Nonetheless, these models still fall short of GANs in terms of speed and image quality. To bridge these gaps, this paper introduces the Latent Denoising Diffusion GAN, which employs pre-trained autoencoders to compress images into a compact latent space, significantly improving inference speed and image quality. Furthermore, we propose a Weighted Learning strategy to enhance diversity and image quality. Experimental results on the CIFAR-10, CelebA-HQ, and LSUN-Church datasets prove that our model achieves state-of-the-art running speed among diffusion models. Compared to its predecessors, DiffusionGAN and Wavelet Diffusion, our model shows remarkable improvements in all evaluation metrics. Code and pre-trained checkpoints: \url{https://github.com/thanhluantrinh/LDDGAN.git}

Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

TL;DR

Abstract

Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

Authors

TL;DR

Abstract

Table of Contents

Figures (7)