Table of Contents
Fetching ...

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

Xiaopeng Sun, Qinwei Lin, Yu Gao, Yujie Zhong, Chengjian Feng, Dengjie Li, Zheng Zhao, Jie Hu, Lin Ma

TL;DR

RFSR addresses the challenge of improving image super-resolution diffusion models by integrating reward feedback learning into a timestep-aware training regime. It employs a low-frequency structure constraint in early denoising steps and reward-driven optimization in later steps, augmented by Gram-KL regularization to curb reward hacking. The approach defines and schedules multiple loss terms, including $\mathcal{L}_{dwt_{ll}}$, $\mathcal{L}_{reward}$, and $\mathcal{L}_{gram-kl}$, across time steps, and demonstrates substantial gains in perceptual and aesthetic metrics on synthetic and real-world ISR benchmarks. The method is plug-and-play for existing diffusion-based ISR models and offers a practical path to higher-quality SR results, while noting limitations related to reliance on pre-trained diffusion backbones and reward models.

Abstract

Generative diffusion models (DM) have been extensively utilized in image super-resolution (ISR). Most of the existing methods adopt the denoising loss from DDPMs for model optimization. We posit that introducing reward feedback learning to finetune the existing models can further improve the quality of the generated images. In this paper, we propose a timestep-aware training strategy with reward feedback learning. Specifically, in the initial denoising stages of ISR diffusion, we apply low-frequency constraints to super-resolution (SR) images to maintain structural stability. In the later denoising stages, we use reward feedback learning to improve the perceptual and aesthetic quality of the SR images. In addition, we incorporate Gram-KL regularization to alleviate stylization caused by reward hacking. Our method can be integrated into any diffusion-based ISR model in a plug-and-play manner. Experiments show that ISR diffusion models, when fine-tuned with our method, significantly improve the perceptual and aesthetic quality of SR images, achieving excellent subjective results. Code: https://github.com/sxpro/RFSR

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

TL;DR

RFSR addresses the challenge of improving image super-resolution diffusion models by integrating reward feedback learning into a timestep-aware training regime. It employs a low-frequency structure constraint in early denoising steps and reward-driven optimization in later steps, augmented by Gram-KL regularization to curb reward hacking. The approach defines and schedules multiple loss terms, including , , and , across time steps, and demonstrates substantial gains in perceptual and aesthetic metrics on synthetic and real-world ISR benchmarks. The method is plug-and-play for existing diffusion-based ISR models and offers a practical path to higher-quality SR results, while noting limitations related to reliance on pre-trained diffusion backbones and reward models.

Abstract

Generative diffusion models (DM) have been extensively utilized in image super-resolution (ISR). Most of the existing methods adopt the denoising loss from DDPMs for model optimization. We posit that introducing reward feedback learning to finetune the existing models can further improve the quality of the generated images. In this paper, we propose a timestep-aware training strategy with reward feedback learning. Specifically, in the initial denoising stages of ISR diffusion, we apply low-frequency constraints to super-resolution (SR) images to maintain structural stability. In the later denoising stages, we use reward feedback learning to improve the perceptual and aesthetic quality of the SR images. In addition, we incorporate Gram-KL regularization to alleviate stylization caused by reward hacking. Our method can be integrated into any diffusion-based ISR model in a plug-and-play manner. Experiments show that ISR diffusion models, when fine-tuned with our method, significantly improve the perceptual and aesthetic quality of SR images, achieving excellent subjective results. Code: https://github.com/sxpro/RFSR

Paper Structure

This paper contains 16 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: (a) The first row shows the progressive denoising of the image during the iterative process, while the next two rows show the low-frequency and high-frequency components derived from each stage of the DWT transformation. Clearly, once the low-frequency components reach stability, their fluctuations decrease, while the high-frequency components become increasingly complex. (b) At smaller steps, both the low-frequency and high-frequency information are close to the ground truth. As the number of steps increases, these frequency components gradually diverge from the GT. This observation leads us to maintain the structural stability of the SR images in the early steps and to encourage the ISR diffusion model to generate more perceptually pleasing and detailed texture information in later steps.
  • Figure 2: Visualization for Reward Hacking. The direct application of reward feedback learning significantly improves the perceptual metrics (e.g., CLIPIQA) of SR images, but leads to reward hacking, resulting in progressively degrading image quality. The subjective manifestation of this issue is that SR images tend to adopt a specific stylization and generate strange lines.
  • Figure 3: Overview of our method.
  • Figure 4: A visual comparison of state-of-the-art ISR diffusion models and their counterparts trained with our RFSR is presented. Each row, from top to bottom, displays the results of bicubic interpolation, the original ISR model, the ISR model trained with our RFSR, and the GT image. Please zoom in for a better view.
  • Figure 5: Effectiveness of Timestep-Aware Training. An excessively large $st_1$ interval causes image blurring, while an overly large $st_2$ interval induces image stylization.
  • ...and 3 more figures