Table of Contents
Fetching ...

HeadsUp! High-Fidelity Portrait Image Super-Resolution

Renjie Li, Zihao Zhu, Xiaoyu Wang, Zhengzhong Tu

TL;DR

...

Abstract

Portrait pictures, which typically feature both human subjects and natural backgrounds, are one of the most prevalent forms of photography on social media. Existing image super-resolution (ISR) techniques generally focus either on generic real-world images or strictly aligned facial images (i.e., face super-resolution). In practice, separate models are blended to handle portrait photos: the face specialist model handles the face region, and the general model processes the rest. However, these blending approaches inevitably introduce blending or boundary artifacts around the facial regions due to different model training recipes, while human perception is particularly sensitive to facial fidelity. To overcome these limitations, we study the portrait image supersolution (PortraitISR) problem, and propose HeadsUp, a single-step diffusion model that is capable of seamlessly restoring and upscaling portrait images in an end-to-end manner. Specifically, we build our model on top of a single-step diffusion model and develop a face supervision mechanism to guide the model in focusing on the facial region. We then integrate a reference-based mechanism to help with identity restoration, reducing face ambiguity in low-quality face restoration. Additionally, we have built a high-quality 4K portrait image ISR dataset dubbed PortraitSR-4K, to support model training and benchmarking for portrait images. Extensive experiments show that HeadsUp achieves state-of-the-art performance on the PortraitISR task while maintaining comparable or higher performance on both general image and aligned face datasets.

HeadsUp! High-Fidelity Portrait Image Super-Resolution

TL;DR

...

Abstract

Portrait pictures, which typically feature both human subjects and natural backgrounds, are one of the most prevalent forms of photography on social media. Existing image super-resolution (ISR) techniques generally focus either on generic real-world images or strictly aligned facial images (i.e., face super-resolution). In practice, separate models are blended to handle portrait photos: the face specialist model handles the face region, and the general model processes the rest. However, these blending approaches inevitably introduce blending or boundary artifacts around the facial regions due to different model training recipes, while human perception is particularly sensitive to facial fidelity. To overcome these limitations, we study the portrait image supersolution (PortraitISR) problem, and propose HeadsUp, a single-step diffusion model that is capable of seamlessly restoring and upscaling portrait images in an end-to-end manner. Specifically, we build our model on top of a single-step diffusion model and develop a face supervision mechanism to guide the model in focusing on the facial region. We then integrate a reference-based mechanism to help with identity restoration, reducing face ambiguity in low-quality face restoration. Additionally, we have built a high-quality 4K portrait image ISR dataset dubbed PortraitSR-4K, to support model training and benchmarking for portrait images. Extensive experiments show that HeadsUp achieves state-of-the-art performance on the PortraitISR task while maintaining comparable or higher performance on both general image and aligned face datasets.

Paper Structure

This paper contains 42 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Different approaches to solve the portrait image super resolution (ISR) task: (a) General ISR models like wu2024one may produce unnatural faces when applied to portrait photos due to the lack of face-specific supervision; (b) While introducing an extra face ISR expert zhou2022towards can generate a more natural face, the blending procedure will introduce inconsistent boundaries. (c) Our portrait ISR approach, HeadsUp, can generate a natural portrait photo without introducing inconsistent boundaries around faces using an all-in-one face-aware restoration model.
  • Figure 2: Pipeline of HeadsUp. Starting from a pre-trained latent diffusion model, we add a LoRA adapter to the VAE encoder and denoising network. HeadsUp takes as input an LQ image and an optional reference, and denoises for only one step to produce an HQ image. In the training stage, we employ face-specific losses to improve facial restoration quality.
  • Figure 3: Qualitative Results. While general ISR approaches can achieve good overall quality, they can not produce high-fidelity faces. The blending approaches produce better face fidelity, but suffer from the border effect that causes inconsistency between the face and other regions.
  • Figure 4: Ablation Studies. Using the identity loss only, though, improves the identity-preservation ability, but will lead to over-smooth and blurred faces.
  • Figure 5: Visualization of some portrait images sampled from PortraitSR-4K.
  • ...and 3 more figures