Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration
Wanglong Lu, Jikai Wang, Tao Wang, Kaihao Zhang, Xianta Jiang, Hanli Zhao
TL;DR
This work addresses blind face restoration by introducing a diffusion-based visual style prompt learning framework that operates in the latent space $\mathcal{W}^+$ of a pre-trained StyleGAN. A diffusion-based style prompt module generates high-quality latent cues $\boldsymbol{w}^0$, which are combined with a StyleGAN facial feature bank and a style-modulated aggregation transformer (SMART) within a restoration auto-encoder to produce $\mathbf{I}_{out}$. The training schema jointly learns the style encoder and code diffuser with diffusion, LPIPS, and identity losses, then trains the restoration network with adversarial objectives, achieving superior perceptual quality on synthetic and real-world data and benefiting downstream tasks like landmark detection and emotion recognition. The approach demonstrates a practical, interpretable way to leverage generative priors for restoration, with potential extensions to incorporate textual prompts and broader video-based applications.
Abstract
Blind face restoration aims to recover high-quality facial images from various unidentified sources of degradation, posing significant challenges due to the minimal information retrievable from the degraded images. Prior knowledge-based methods, leveraging geometric priors and facial features, have led to advancements in face restoration but often fall short of capturing fine details. To address this, we introduce a visual style prompt learning framework that utilizes diffusion probabilistic models to explicitly generate visual prompts within the latent space of pre-trained generative models. These prompts are designed to guide the restoration process. To fully utilize the visual prompts and enhance the extraction of informative and rich patterns, we introduce a style-modulated aggregation transformation layer. Extensive experiments and applications demonstrate the superiority of our method in achieving high-quality blind face restoration. The source code is available at \href{https://github.com/LonglongaaaGo/VSPBFR}{https://github.com/LonglongaaaGo/VSPBFR}.
