GenDR: Lightning Generative Detail Restorator
Yan Wang, Shijie Zhao, Kai Chen, Kexin Zhang, Junlin Li, Li Zhang
TL;DR
GenDR tackles the mismatch between text-to-image diffusion goals and real-world SR by using a larger latent space (SD2.1-VAE16) and a one-step diffusion process. It introduces Consistent Score Identity Distillation (CiD) to inject SR priors into score-based guidance and CiDA, which adds adversarial learning and representation alignment to accelerate training and improve perceptual quality. The approach culminates in a simplified GenDR pipeline that eliminates schedulers and conditioning modules, enabling fast, reliable SR with rich details. Across synthetic and real-world benchmarks, GenDR achieves state-of-the-art performance among one-step methods and competitive results against multi-step diffusion models, with clear gains in efficiency and detail fidelity.
Abstract
Recent research applying text-to-image (T2I) diffusion models to real-world super-resolution (SR) has achieved remarkable success. However, fundamental misalignments between T2I and SR targets result in a dilemma between inference speed and detail fidelity. Specifically, T2I tasks prioritize multi-step inversion to synthesize coherent outputs aligned with textual prompts and shrink the latent space to reduce generating complexity. Contrariwise, SR tasks preserve most information from low-resolution input while solely restoring high-frequency details, thus necessitating sufficient latent space and fewer inference steps. To bridge the gap, we present a one-step diffusion model for generative detail restoration, GenDR, distilled from a tailored diffusion model with larger latent space. In detail, we train a new SD2.1-VAE16 (0.9B) via representation alignment to expand latent space without enlarging the model size. Regarding step-distillation, we propose consistent score identity distillation (CiD) that incorporates SR task-specific loss into score distillation to leverage more SR priors and align the training target. Furthermore, we extend CiD with adversarial learning and representation alignment (CiDA) to enhance perceptual quality and accelerate training. We also polish the pipeline to achieve a more efficient inference. Experimental results demonstrate that GenDR achieves state-of-the-art performance in both quantitative metrics and visual fidelity.
