SupResDiffGAN a new approach for the Super-Resolution task
Dawid Kopeć, Wojciech Kozłowski, Maciej Wizerkaniuk, Dawid Krutul, Jan Kocoń, Maciej Zięba
TL;DR
SupResDiffGAN presents a latent-space diffusion-GAN hybrid for single-image super-resolution, addressing the speed-accuracy trade-off of diffusion models by operating in a compressed latent space and leveraging adversarial feedback. The approach encodes image pairs into latent codes via a pretrained VAE, uses a U-Net to denoise in diffusion steps conditioned on a low-resolution latent, and employs a Gaussian-noise augmented discriminator with EMA-driven step scheduling to stabilize training. Empirical results on multiple SR benchmarks show competitive LPIPS performance and markedly faster inference than traditional diffusion SR models, approaching GAN-based methods in quality. This work demonstrates a viable path toward real-time diffusion-based SR and suggests further exploration of latent diffusion and diffusion-GAN hybrids for practical deployment.
Abstract
In this work, we present SupResDiffGAN, a novel hybrid architecture that combines the strengths of Generative Adversarial Networks (GANs) and diffusion models for super-resolution tasks. By leveraging latent space representations and reducing the number of diffusion steps, SupResDiffGAN achieves significantly faster inference times than other diffusion-based super-resolution models while maintaining competitive perceptual quality. To prevent discriminator overfitting, we propose adaptive noise corruption, ensuring a stable balance between the generator and the discriminator during training. Extensive experiments on benchmark datasets show that our approach outperforms traditional diffusion models such as SR3 and I$^2$SB in efficiency and image quality. This work bridges the performance gap between diffusion- and GAN-based methods, laying the foundation for real-time applications of diffusion models in high-resolution image generation.
