One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation
Daniil Selikhanovych, David Li, Aleksei Leonov, Nikita Gushchin, Sergei Kushneriuk, Alexander Filippov, Evgeny Burnaev, Iaroslav Koshelev, Alexander Korotin
TL;DR
RSD tackles the computational bottleneck of diffusion-based image super-resolution by distilling a ResShift teacher into a one-step generator. It derives a tractable joint-distribution KL objective that leverages a fake ResShift to avoid backpropagating through retraining, and augments with LPIPS and GAN supervision in latent space to boost perceptual fidelity. Empirically, RSD achieves competitive perceptual metrics and fidelity, surpassing the teacher and rivaling state-of-the-art diffusion SR methods on Real-ISR benchmarks with substantially fewer resources. This work makes diffusion-based SR more practical for real-world deployment by delivering high-quality, fast SR on consumer-scale hardware.
Abstract
Diffusion models for super-resolution (SR) produce high-quality visual results but require expensive computational costs. Despite the development of several methods to accelerate diffusion-based SR models, some (e.g., SinSR) fail to produce realistic perceptual details, while others (e.g., OSEDiff) may hallucinate non-existent structures. To overcome these issues, we present RSD, a new distillation method for ResShift, one of the top diffusion-based SR models. Our method is based on training the student network to produce such images that a new fake ResShift model trained on them will coincide with the teacher model. RSD achieves single-step restoration and outperforms the teacher by a large margin. We show that our distillation method can surpass the other distillation-based method for ResShift - SinSR - making it on par with state-of-the-art diffusion-based SR distillation methods. Compared to SR methods based on pre-trained text-to-image models, RSD produces competitive perceptual quality, provides images with better alignment to degraded input images, and requires fewer parameters and GPU memory. We provide experimental results on various real-world and synthetic datasets, including RealSR, RealSet65, DRealSR, ImageNet, and DIV2K.
