Table of Contents
Fetching ...

PFStorer: Personalized Face Restoration and Super-Resolution

Tuomas Varanka, Tapani Toivonen, Soumya Tripathy, Guoying Zhao, Erman Acar

TL;DR

PFStorer presents a principled approach to personalized face restoration by injecting identity-specific priors into a strong base diffusion restoration model through trainable adapters. It preserves base priors with a learnable gamma balance and employs a generative regularizer to prevent identity leakage from low-quality inputs, enabling faithful, high-fidelity restoration across multiple identities and degradations. The method adopts an alignment-free training pipeline and synthetic noise modeling to simulate real-world conditions, achieving superior identity preservation in both quantitative metrics and a user study. The work demonstrates practical impact for identity-faithful restoration in real-world imagery, while acknowledging limitations related to reference-image bias, computational cost, and diffusion-model artifacts.

Abstract

Recent developments in face restoration have achieved remarkable results in producing high-quality and lifelike outputs. The stunning results however often fail to be faithful with respect to the identity of the person as the models lack necessary context. In this paper, we explore the potential of personalized face restoration with diffusion models. In our approach a restoration model is personalized using a few images of the identity, leading to tailored restoration with respect to the identity while retaining fine-grained details. By using independent trainable blocks for personalization, the rich prior of a base restoration model can be exploited to its fullest. To avoid the model relying on parts of identity left in the conditioning low-quality images, a generative regularizer is employed. With a learnable parameter, the model learns to balance between the details generated based on the input image and the degree of personalization. Moreover, we improve the training pipeline of face restoration models to enable an alignment-free approach. We showcase the robust capabilities of our approach in several real-world scenarios with multiple identities, demonstrating our method's ability to generate fine-grained details with faithful restoration. In the user study we evaluate the perceptual quality and faithfulness of the genereated details, with our method being voted best 61% of the time compared to the second best with 25% of the votes.

PFStorer: Personalized Face Restoration and Super-Resolution

TL;DR

PFStorer presents a principled approach to personalized face restoration by injecting identity-specific priors into a strong base diffusion restoration model through trainable adapters. It preserves base priors with a learnable gamma balance and employs a generative regularizer to prevent identity leakage from low-quality inputs, enabling faithful, high-fidelity restoration across multiple identities and degradations. The method adopts an alignment-free training pipeline and synthetic noise modeling to simulate real-world conditions, achieving superior identity preservation in both quantitative metrics and a user study. The work demonstrates practical impact for identity-faithful restoration in real-world imagery, while acknowledging limitations related to reference-image bias, computational cost, and diffusion-model artifacts.

Abstract

Recent developments in face restoration have achieved remarkable results in producing high-quality and lifelike outputs. The stunning results however often fail to be faithful with respect to the identity of the person as the models lack necessary context. In this paper, we explore the potential of personalized face restoration with diffusion models. In our approach a restoration model is personalized using a few images of the identity, leading to tailored restoration with respect to the identity while retaining fine-grained details. By using independent trainable blocks for personalization, the rich prior of a base restoration model can be exploited to its fullest. To avoid the model relying on parts of identity left in the conditioning low-quality images, a generative regularizer is employed. With a learnable parameter, the model learns to balance between the details generated based on the input image and the degree of personalization. Moreover, we improve the training pipeline of face restoration models to enable an alignment-free approach. We showcase the robust capabilities of our approach in several real-world scenarios with multiple identities, demonstrating our method's ability to generate fine-grained details with faithful restoration. In the user study we evaluate the perceptual quality and faithfulness of the genereated details, with our method being voted best 61% of the time compared to the second best with 25% of the votes.
Paper Structure (48 sections, 10 equations, 23 figures, 6 tables)

This paper contains 48 sections, 10 equations, 23 figures, 6 tables.

Figures (23)

  • Figure 1: Imagine wanting to restore a photo of yourself, only for the resulting image to not be you, but someone else! By utilizing a few high-quality reference images, we can faithfully restore images with fine-grained details. Best viewed by zooming in.
  • Figure 2: Results under increasing levels of degradation. a) With only minor degradation, both base and personalized model are capable of restoration. b) The base model incorrectly restores fine-grained details such as the nose and skin texture. c) More identity details such as eyes and facial hair are lost. d) Base model outputs a completely different identity, while the personalized model retains details of the identity, even if the semantics are not entirely correct due to the extreme low-quality input image. Best viewed by zooming in.
  • Figure 3: (Left) PFStorer restores an image with a diffusion process conditioned on the LQ and the reference image. Base Model blocks are visualized in green and Personlization blocks in purple. StableDiffusion stablediffusion is used to extract features $F^i_{Ref}$ from the reference image. During training the reference image is randomly sampled from a set of reference images for each training iteration. During inference, no reference images are required as the identity is learned in the personalization blocks as a neural representation. (Right) $i$th UNet block containing the Base Model Block stablesr and Personalization Block vico. The Base Model Blocks contain the normal Stable Diffusion blocks with SFT (spatial feature transformation) sft blocks from StableSR stablesr. After the Base Model block, the intermediate features $F^i$ go to a trainable Personalization Block, which contains cross-attention between the text-embedding and reference image features $F^i_{Ref}$. A learnable adapter vector $\gamma^i$ balances the contribution between the base model and personalization.
  • Figure 4: 20x Super-resolution of a low-quality image. Super-resolution for images larger than $512 \times 512$ using a tiling approach from stablesr. Image edited from Vecteezy.com.
  • Figure 5: Qualitative comparison with state-of-the-art restoration models on real-world images. Images from Wikimedia Commons.
  • ...and 18 more figures