Table of Contents
Fetching ...

OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution

Xinning Chai, Zhengxue Cheng, Yuhong Zhang, Hengsheng Zhang, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song

TL;DR

OmniScaleSR tackles the challenge of faithful and realistic arbitrary-scale image super-resolution by introducing explicit diffusion-native scale controls—global scale injection and local scale modulation—combined with implicit diffusion prior adaptation. The method employs a two-branch latent diffusion framework with multi-domain fidelity enhancements, including pixel-, pixel-to-latent-, and latent-space guidance via dual semantic prompts and SePR attention. Comprehensive experiments on bicubic and real-world degradations demonstrate superior fidelity and realism, especially at ultra-high scales, compared with state-of-the-art diffusion-based and INR-based ASSR methods. Limitations include longer inference time and potential semantic bias from prompts, suggesting directions for acceleration and robust prompt handling. Overall, OmniScaleSR provides a scalable approach to Real-ASSR that maintains high-quality reconstructions across diverse scales and degradation types.

Abstract

Arbitrary-scale super-resolution (ASSR) overcomes the limitation of traditional super-resolution (SR) methods that operate only at fixed scales (e.g., 4x), enabling a single model to handle arbitrary magnification. Most existing ASSR approaches rely on implicit neural representation (INR), but its regression-driven feature extraction and aggregation intrinsically limit the ability to synthesize fine details, leading to low realism. Recent diffusion-based realistic image super-resolution (Real-ISR) models leverage powerful pre-trained diffusion priors and show impressive results at the 4x setting. We observe that they can also achieve ASSR because the diffusion prior implicitly adapts to scale by encouraging high-realism generation. However, without explicit scale control, the diffusion process cannot be properly adjusted for different magnification levels, resulting in excessive hallucination or blurry outputs, especially under ultra-high scales. To address these issues, we propose OmniScaleSR, a diffusion-based realistic arbitrary-scale SR framework designed to achieve both high fidelity and high realism. We introduce explicit, diffusion-native scale control mechanisms that work synergistically with implicit scale adaptation, enabling scale-aware and content-aware modulation of the diffusion process. In addition, we incorporate multi-domain fidelity enhancement designs to further improve reconstruction accuracy. Extensive experiments on bicubic degradation benchmarks and real-world datasets show that OmniScaleSR surpasses state-of-the-art methods in both fidelity and perceptual realism, with particularly strong performance at large magnification factors. Code will be released at https://github.com/chaixinning/OmniScaleSR.

OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution

TL;DR

OmniScaleSR tackles the challenge of faithful and realistic arbitrary-scale image super-resolution by introducing explicit diffusion-native scale controls—global scale injection and local scale modulation—combined with implicit diffusion prior adaptation. The method employs a two-branch latent diffusion framework with multi-domain fidelity enhancements, including pixel-, pixel-to-latent-, and latent-space guidance via dual semantic prompts and SePR attention. Comprehensive experiments on bicubic and real-world degradations demonstrate superior fidelity and realism, especially at ultra-high scales, compared with state-of-the-art diffusion-based and INR-based ASSR methods. Limitations include longer inference time and potential semantic bias from prompts, suggesting directions for acceleration and robust prompt handling. Overall, OmniScaleSR provides a scalable approach to Real-ASSR that maintains high-quality reconstructions across diverse scales and degradation types.

Abstract

Arbitrary-scale super-resolution (ASSR) overcomes the limitation of traditional super-resolution (SR) methods that operate only at fixed scales (e.g., 4x), enabling a single model to handle arbitrary magnification. Most existing ASSR approaches rely on implicit neural representation (INR), but its regression-driven feature extraction and aggregation intrinsically limit the ability to synthesize fine details, leading to low realism. Recent diffusion-based realistic image super-resolution (Real-ISR) models leverage powerful pre-trained diffusion priors and show impressive results at the 4x setting. We observe that they can also achieve ASSR because the diffusion prior implicitly adapts to scale by encouraging high-realism generation. However, without explicit scale control, the diffusion process cannot be properly adjusted for different magnification levels, resulting in excessive hallucination or blurry outputs, especially under ultra-high scales. To address these issues, we propose OmniScaleSR, a diffusion-based realistic arbitrary-scale SR framework designed to achieve both high fidelity and high realism. We introduce explicit, diffusion-native scale control mechanisms that work synergistically with implicit scale adaptation, enabling scale-aware and content-aware modulation of the diffusion process. In addition, we incorporate multi-domain fidelity enhancement designs to further improve reconstruction accuracy. Extensive experiments on bicubic degradation benchmarks and real-world datasets show that OmniScaleSR surpasses state-of-the-art methods in both fidelity and perceptual realism, with particularly strong performance at large magnification factors. Code will be released at https://github.com/chaixinning/OmniScaleSR.

Paper Structure

This paper contains 29 sections, 13 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Left: Comparison of ASSR implementations in existing methods: (a) Explicit INR-based scale control with regression-based optimization chen2021learninglee2022localchen2023cascadedcao2023ciaosrsong2023opehe2024latentgao2023implicitkim2024arbitrary (b) Implicit generative scale adaptation benefitted from pre-trained diffusion prior in diffusion-based Real-ISR methods lin2024diffbiryang2025pixelyu2024scalingwu2024seesrsun2024coserqu2024xpsrchen2025faithdiff, (c) Our OmniScaleSR adopts both explicit diffusion-native scale controls and implicit generative scale adaptation for both high-fidelity and high-realism ASSR. Right: Visual comparison with the state-of-the-art methods. 'F' and 'R' are abbreviations of 'Fidelity' and 'Realism'.
  • Figure 2: Overview of OmniScaleSR, which consists of a generation branch and a fidelity branch in the latent space. To enable explicit diffusion-native SR scale controls, we introduce a global scale injection mechanism (red arrow) for overall perception and a local scale modulation mechanism (purple arrow) to dynamically manage the model's generation and fidelity abilities.
  • Figure 3: Qualitative comparisons under bicubic downsampling. SR scales from top to bottom: ×5.3, ×16, ×24.
  • Figure 4: Qualitative comparisons under real-world degradations. SR scales from top to bottom: ×5.3, ×16, and ×24.
  • Figure 5: Ablation study on key components of our method: (a) without all scale-controlled mechanisms, (b) without local scale modulation mechanism, (c) without global scale injection mechanism, (d) without SePR Attention, (e) baseline using tags to replace the image captions as text prompt, (f) without the pre-trained ×4 upsampler.
  • ...and 1 more figures