DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

Bin Wu; Wei Wang; Yahui Liu; Zixiang Li; Yao Zhao

DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

Bin Wu, Wei Wang, Yahui Liu, Zixiang Li, Yao Zhao

TL;DR

DiffusionReward introduces Reward Feedback Learning to blind face restoration by employing a Face Reward Model trained on human preferences to provide gradient feedback during diffusion denoising. The method couples a dynamic FRM with a structural consistency constraint and weight regularization to preserve identity while enhancing facial detail, mitigating reward hacking through continual FRM updates. Experiments on synthetic and real-world datasets show state-of-the-art improvements in perceptual quality and identity fidelity across diffusion-based BFR baselines. The approach offers a principled way to align restoration outputs with human preferences, with practical implications for high-fidelity, identity-preserving face restoration in the wild.

Abstract

Reward Feedback Learning (ReFL) has recently shown great potential in aligning model outputs with human preferences across various generative tasks. In this work, we introduce a ReFL framework, named DiffusionReward, to the Blind Face Restoration task for the first time. DiffusionReward effectively overcomes the limitations of diffusion-based methods, which often fail to generate realistic facial details and exhibit poor identity consistency. The core of our framework is the Face Reward Model (FRM), which is trained using carefully annotated data. It provides feedback signals that play a pivotal role in steering the optimization process of the restoration network. In particular, our ReFL framework incorporates a gradient flow into the denoising process of off-the-shelf face restoration methods to guide the update of model parameters. The guiding gradient is collaboratively determined by three aspects: (i) the FRM to ensure the perceptual quality of the restored faces; (ii) a regularization term that functions as a safeguard to preserve generative diversity; and (iii) a structural consistency constraint to maintain facial fidelity. Furthermore, the FRM undergoes dynamic optimization throughout the process. It not only ensures that the restoration network stays precisely aligned with the real face manifold, but also effectively prevents reward hacking. Experiments on synthetic and wild datasets demonstrate that our method outperforms state-of-the-art methods, significantly improving identity consistency and facial details. The source codes, data, and models are available at: https://github.com/01NeuralNinja/DiffusionReward.

DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

TL;DR

Abstract

DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)