Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution
Zihao Fan, Xin Lu, Yidi Liu, Jie Huang, Dong Li, Xueyang Fu, Zheng-Jun Zha
TL;DR
Bird-SR introduces a bidirectional reward-guided diffusion framework for real-world image super-resolution, jointly leveraging synthetic LR–HR pairs and real LR images. By applying relative rewards during forward noise-injection on synthetic data and reward-guided reverse diffusion with semantic alignment on real data, it achieves improved perceptual quality while preserving structural fidelity. A dynamic fidelity–perception weighting across diffusion steps further stabilizes training and balances early structural guidance with late-stage perceptual refinement. Experiments on multiple real-world SR benchmarks demonstrate superior perceptual realism and robustness against distribution shifts, highlighting the practical potential of trajectory-level, reward-guided diffusion for real-world restoration tasks.
Abstract
Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images due to distribution shifts. We propose Bird-SR, a bidirectional reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images. For structural fidelity easily affected in ReFL, the model is directly optimized on synthetic pairs at early diffusion steps, which also facilitates structure preservation for real-world inputs under smaller distribution gap in structure levels. For perceptual enhancement, quality-guided rewards are applied at later sampling steps to both synthetic and real LR images. To mitigate reward hacking, the rewards for synthetic results are formulated in a relative advantage space bounded by their clean counterparts, while real-world optimization is regularized via a semantic alignment constraint. Furthermore, to balance structural and perceptual learning, we adopt a dynamic fidelity-perception weighting strategy that emphasizes structure preservation at early stages and progressively shifts focus toward perceptual optimization at later diffusion steps. Extensive experiments on real-world SR benchmarks demonstrate that Bird-SR consistently outperforms state-of-the-art methods in perceptual quality while preserving structural consistency, validating its effectiveness for real-world super-resolution.
