Table of Contents
Fetching ...

Realism Control One-step Diffusion for Real-World Image Super-Resolution

Zongliang Wu, Siming Zheng, Peng-Tao Jiang, Xin Yuan

TL;DR

This paper tackles Real-ISR under unknown real-world degradations by targeting the rigidity of existing one-step diffusion (OSD) methods, which typically use a fixed timestep and lack flexible fidelity-realism control. It introduces RCOD, a modular framework comprising Latent Domain Grouping (LDG), a latent degradation metric (M_L) with a Debgradation-Aware Sampling (DAS) strategy, and a Visual Prompt Injection Module (VPIM) to replace text prompts with degradation-aware visual cues. The approach enables explicit control over fidelity and realism during inference while keeping training lightweight via minimal paradigm changes and data usage. Empirical results show RCOD-enhanced OSD methods outperform or closely match state-of-the-art baselines across synthetic and real-world benchmarks on both full-reference and no-reference image quality metrics, while preserving efficiency, demonstrating practical applicability for real-time Real-ISR with tunable perceptual quality.

Abstract

Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage.

Realism Control One-step Diffusion for Real-World Image Super-Resolution

TL;DR

This paper tackles Real-ISR under unknown real-world degradations by targeting the rigidity of existing one-step diffusion (OSD) methods, which typically use a fixed timestep and lack flexible fidelity-realism control. It introduces RCOD, a modular framework comprising Latent Domain Grouping (LDG), a latent degradation metric (M_L) with a Debgradation-Aware Sampling (DAS) strategy, and a Visual Prompt Injection Module (VPIM) to replace text prompts with degradation-aware visual cues. The approach enables explicit control over fidelity and realism during inference while keeping training lightweight via minimal paradigm changes and data usage. Empirical results show RCOD-enhanced OSD methods outperform or closely match state-of-the-art baselines across synthetic and real-world benchmarks on both full-reference and no-reference image quality metrics, while preserving efficiency, demonstrating practical applicability for real-time Real-ISR with tunable perceptual quality.

Abstract

Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage.

Paper Structure

This paper contains 23 sections, 6 equations, 9 figures, 8 tables.

Figures (9)

  • Figure S1: While previous one-step diffusion methods, such as S3Diff zhang2024degradation-s3diff only yield one optimal result (b), our approach offers the flexibility to control images (c-d) with different fidelity-realism trade-offs during inference, enhancing practical applicability across different scenarios.
  • Figure S2: Realism control one-step diffusion (RCOD) training process. The left part illustrates several synthesized real-world LR images by applying diverse degradations with varying types and intensities on an HR image. (a) Existing vanilla one-step diffusion (OSD) methods for super-resolution (SR): These LR images are directly sent into the diffusion forward and reverse process; the denoising U-Net tends to learn to recover the 'average' degradation, leading to a monotonous generation ability within the latent domain. (b) Our proposed Realism Control One-Step Diffusion employs a latent domain grouping strategy. This allows for adaptive control of timesteps (denoising degrees) during the forward process according to the degradation degree in the latent domain. As a result, the denoising U-Net can acquire a more diverse generation capability based on the timestep.
  • Figure S3: Influence of different timesteps $t$ using SD-turbo.
  • Figure S4: Distribution of (a) mean value of LR and HR training images in the latent domain, (b) the $M_L$ metric in the latent domain of VAE before and after training.
  • Figure S5: Visual comparison ($\times$4) of RCOD$_\text{O}$-Real. with other methods on DRealSR data.
  • ...and 4 more figures