Table of Contents
Fetching ...

GenDR: Lightning Generative Detail Restorator

Yan Wang, Shijie Zhao, Kai Chen, Kexin Zhang, Junlin Li, Li Zhang

TL;DR

GenDR tackles the mismatch between text-to-image diffusion goals and real-world SR by using a larger latent space (SD2.1-VAE16) and a one-step diffusion process. It introduces Consistent Score Identity Distillation (CiD) to inject SR priors into score-based guidance and CiDA, which adds adversarial learning and representation alignment to accelerate training and improve perceptual quality. The approach culminates in a simplified GenDR pipeline that eliminates schedulers and conditioning modules, enabling fast, reliable SR with rich details. Across synthetic and real-world benchmarks, GenDR achieves state-of-the-art performance among one-step methods and competitive results against multi-step diffusion models, with clear gains in efficiency and detail fidelity.

Abstract

Recent research applying text-to-image (T2I) diffusion models to real-world super-resolution (SR) has achieved remarkable success. However, fundamental misalignments between T2I and SR targets result in a dilemma between inference speed and detail fidelity. Specifically, T2I tasks prioritize multi-step inversion to synthesize coherent outputs aligned with textual prompts and shrink the latent space to reduce generating complexity. Contrariwise, SR tasks preserve most information from low-resolution input while solely restoring high-frequency details, thus necessitating sufficient latent space and fewer inference steps. To bridge the gap, we present a one-step diffusion model for generative detail restoration, GenDR, distilled from a tailored diffusion model with larger latent space. In detail, we train a new SD2.1-VAE16 (0.9B) via representation alignment to expand latent space without enlarging the model size. Regarding step-distillation, we propose consistent score identity distillation (CiD) that incorporates SR task-specific loss into score distillation to leverage more SR priors and align the training target. Furthermore, we extend CiD with adversarial learning and representation alignment (CiDA) to enhance perceptual quality and accelerate training. We also polish the pipeline to achieve a more efficient inference. Experimental results demonstrate that GenDR achieves state-of-the-art performance in both quantitative metrics and visual fidelity.

GenDR: Lightning Generative Detail Restorator

TL;DR

GenDR tackles the mismatch between text-to-image diffusion goals and real-world SR by using a larger latent space (SD2.1-VAE16) and a one-step diffusion process. It introduces Consistent Score Identity Distillation (CiD) to inject SR priors into score-based guidance and CiDA, which adds adversarial learning and representation alignment to accelerate training and improve perceptual quality. The approach culminates in a simplified GenDR pipeline that eliminates schedulers and conditioning modules, enabling fast, reliable SR with rich details. Across synthetic and real-world benchmarks, GenDR achieves state-of-the-art performance among one-step methods and competitive results against multi-step diffusion models, with clear gains in efficiency and detail fidelity.

Abstract

Recent research applying text-to-image (T2I) diffusion models to real-world super-resolution (SR) has achieved remarkable success. However, fundamental misalignments between T2I and SR targets result in a dilemma between inference speed and detail fidelity. Specifically, T2I tasks prioritize multi-step inversion to synthesize coherent outputs aligned with textual prompts and shrink the latent space to reduce generating complexity. Contrariwise, SR tasks preserve most information from low-resolution input while solely restoring high-frequency details, thus necessitating sufficient latent space and fewer inference steps. To bridge the gap, we present a one-step diffusion model for generative detail restoration, GenDR, distilled from a tailored diffusion model with larger latent space. In detail, we train a new SD2.1-VAE16 (0.9B) via representation alignment to expand latent space without enlarging the model size. Regarding step-distillation, we propose consistent score identity distillation (CiD) that incorporates SR task-specific loss into score distillation to leverage more SR priors and align the training target. Furthermore, we extend CiD with adversarial learning and representation alignment (CiDA) to enhance perceptual quality and accelerate training. We also polish the pipeline to achieve a more efficient inference. Experimental results demonstrate that GenDR achieves state-of-the-art performance in both quantitative metrics and visual fidelity.

Paper Structure

This paper contains 13 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Example low-quality input and restoration results from the GenDR (left) and performance comparison among diffusion-based SR methods (right), both of which demonstrate the advanced performance of the proposed GenDR. (Zoom-in for best view.)
  • Figure 2: Motivation: divergent task objectives make dilemma. T2I task (generation $>$ reconstruction) bridges the huge gap between initial distribution (noise) to target, thus preferring multi-steps (better refinement) and narrow latent space (less difficulty) to make results reasonable. SR task (reconstruction $>$ generation) restores only details from adjacent distribution (LQ), needing fewer steps and high-dimensional space. We visualize the latent distribution on ImageNet-val ImageNet with t-SNE.
  • Figure 3: 1024$^2$px and 512$^2$px samples produced by SD2.1-VAE16. (Zoom-in for best view.)
  • Figure 4: Illustration of the proposed CiDA training scheme for GenDR. GenDR and base score network are initialized with SD2.1-VAE16. The real/fake score network is implemented by LoRA. LR latent is fed into GenDR to restore SR-latent and representation for REPA loss \ref{['eq:repa']}. After diffusion forward pass, noised SR and HR latent are used to calculate CiD loss \ref{['eq:cid']} and GAN loss \ref{['eq:adv']}.
  • Figure 5: Illustration of proposed GenDR pipeline.
  • ...and 4 more figures