Realism Control One-step Diffusion for Real-World Image Super-Resolution
Zongliang Wu, Siming Zheng, Peng-Tao Jiang, Xin Yuan
TL;DR
This paper tackles Real-ISR under unknown real-world degradations by targeting the rigidity of existing one-step diffusion (OSD) methods, which typically use a fixed timestep and lack flexible fidelity-realism control. It introduces RCOD, a modular framework comprising Latent Domain Grouping (LDG), a latent degradation metric (M_L) with a Debgradation-Aware Sampling (DAS) strategy, and a Visual Prompt Injection Module (VPIM) to replace text prompts with degradation-aware visual cues. The approach enables explicit control over fidelity and realism during inference while keeping training lightweight via minimal paradigm changes and data usage. Empirical results show RCOD-enhanced OSD methods outperform or closely match state-of-the-art baselines across synthetic and real-world benchmarks on both full-reference and no-reference image quality metrics, while preserving efficiency, demonstrating practical applicability for real-time Real-ISR with tunable perceptual quality.
Abstract
Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage.
