Co-learning Single-Step Diffusion Upsampler and Downsampler with Two Discriminators and Distillation
Sohwi Kim, Tae-Kyun Kim
TL;DR
This paper tackles real-world SR where degradations are unknown and diverse. It introduces a co-learning framework that jointly trains a single-step diffusion-based upsampler and a learnable diffusion-based downsampler, guided by two discriminators and cyclic distillation to model both HR and LR domains. The approach achieves state-of-the-art or competitive results on Real-ISR and FFHQ face SR, with efficient single-step inference and robust handling of real degradations. The work demonstrates that diffusion-based downsampling, coupled with adversarial guidance and distillation, can bridge synthetic and real-world SR gaps and enable practical, high-quality SR in real-time settings.
Abstract
Super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts, often relying on effective downsampling to generate diverse and realistic training pairs. In this work, we propose a co-learning framework that jointly optimizes a single-step diffusion-based upsampler and a learnable downsampler, enhanced by two discriminators and a cyclic distillation strategy. Our learnable downsampler is designed to better capture realistic degradation patterns while preserving structural details in the LR domain, which is crucial for enhancing SR performance. By leveraging a diffusion-based approach, our model generates diverse LR-HR pairs during training, enabling robust learning across varying degradations. We demonstrate the effectiveness of our method on both general real-world and domain-specific face SR tasks, achieving state-of-the-art performance in both fidelity and perceptual quality. Our approach not only improves efficiency with a single inference step but also ensures high-quality image reconstruction, bridging the gap between synthetic and real-world SR scenarios.
