Table of Contents
Fetching ...

Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution

Zihao Fan, Xin Lu, Yidi Liu, Jie Huang, Dong Li, Xueyang Fu, Zheng-Jun Zha

TL;DR

Bird-SR introduces a bidirectional reward-guided diffusion framework for real-world image super-resolution, jointly leveraging synthetic LR–HR pairs and real LR images. By applying relative rewards during forward noise-injection on synthetic data and reward-guided reverse diffusion with semantic alignment on real data, it achieves improved perceptual quality while preserving structural fidelity. A dynamic fidelity–perception weighting across diffusion steps further stabilizes training and balances early structural guidance with late-stage perceptual refinement. Experiments on multiple real-world SR benchmarks demonstrate superior perceptual realism and robustness against distribution shifts, highlighting the practical potential of trajectory-level, reward-guided diffusion for real-world restoration tasks.

Abstract

Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images due to distribution shifts. We propose Bird-SR, a bidirectional reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images. For structural fidelity easily affected in ReFL, the model is directly optimized on synthetic pairs at early diffusion steps, which also facilitates structure preservation for real-world inputs under smaller distribution gap in structure levels. For perceptual enhancement, quality-guided rewards are applied at later sampling steps to both synthetic and real LR images. To mitigate reward hacking, the rewards for synthetic results are formulated in a relative advantage space bounded by their clean counterparts, while real-world optimization is regularized via a semantic alignment constraint. Furthermore, to balance structural and perceptual learning, we adopt a dynamic fidelity-perception weighting strategy that emphasizes structure preservation at early stages and progressively shifts focus toward perceptual optimization at later diffusion steps. Extensive experiments on real-world SR benchmarks demonstrate that Bird-SR consistently outperforms state-of-the-art methods in perceptual quality while preserving structural consistency, validating its effectiveness for real-world super-resolution.

Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution

TL;DR

Bird-SR introduces a bidirectional reward-guided diffusion framework for real-world image super-resolution, jointly leveraging synthetic LR–HR pairs and real LR images. By applying relative rewards during forward noise-injection on synthetic data and reward-guided reverse diffusion with semantic alignment on real data, it achieves improved perceptual quality while preserving structural fidelity. A dynamic fidelity–perception weighting across diffusion steps further stabilizes training and balances early structural guidance with late-stage perceptual refinement. Experiments on multiple real-world SR benchmarks demonstrate superior perceptual realism and robustness against distribution shifts, highlighting the practical potential of trajectory-level, reward-guided diffusion for real-world restoration tasks.

Abstract

Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images due to distribution shifts. We propose Bird-SR, a bidirectional reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images. For structural fidelity easily affected in ReFL, the model is directly optimized on synthetic pairs at early diffusion steps, which also facilitates structure preservation for real-world inputs under smaller distribution gap in structure levels. For perceptual enhancement, quality-guided rewards are applied at later sampling steps to both synthetic and real LR images. To mitigate reward hacking, the rewards for synthetic results are formulated in a relative advantage space bounded by their clean counterparts, while real-world optimization is regularized via a semantic alignment constraint. Furthermore, to balance structural and perceptual learning, we adopt a dynamic fidelity-perception weighting strategy that emphasizes structure preservation at early stages and progressively shifts focus toward perceptual optimization at later diffusion steps. Extensive experiments on real-world SR benchmarks demonstrate that Bird-SR consistently outperforms state-of-the-art methods in perceptual quality while preserving structural consistency, validating its effectiveness for real-world super-resolution.
Paper Structure (34 sections, 17 equations, 13 figures, 8 tables, 1 algorithm)

This paper contains 34 sections, 17 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 2: Overview of the Bird-SR, a bidirectional reward-guided diffusion framework for real-world Super-Resolution. For synthetic low-resolution data, predefined noise is injected into clean images $x_0^{1:n}$, and intermediate predictions $\hat{x}_0^{1:n}$ are obtained via closed-form single-step interpolation. For real-world low-resolution data, sampling starts from pure noise $x_T^{1:n}$ and optimizes only the final timestep along the reverse diffusion trajectory, with a reference model output $\tilde{x}_0^{1:n}$ introduced to enforce semantic alignment.
  • Figure 3: Qualitative comparisons with state-of-the-art Real-ISR methods. Our method performs best in terms of image realism and detail generation especially preserving fine structures and restoring text details. More visual results can be found in the Appendix.
  • Figure 4: Visualization of ablation for the four variants
  • Figure 5: different distortion–perception weighting.
  • Figure 6: Visualization of LBP Texture Features. As evidenced by the LBP texture results, compared to real-world data, the synthetic LR data is superimposed with additional information in the high-frequency components. This leads to an input distribution shift, particularly hindering the recovery of fine-grained details.
  • ...and 8 more figures