Table of Contents
Fetching ...

Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos

Zhouxia Wang, Jiawei Zhang, Xintao Wang, Tianshui Chen, Ying Shan, Wenping Wang, Ping Luo

TL;DR

A Temporal Consistency Network (TCN) cooperated with alignment smoothing to reduce jitters and flickers in restored videos is proposed, a flexible component that can be seamlessly plugged into the most advanced face image restoration algorithms, ensuring the quality of image-based restoration is maintained as closely as possible.

Abstract

Recent progress in blind face restoration has resulted in producing high-quality restored results for static images. However, efforts to extend these advancements to video scenarios have been minimal, partly because of the absence of benchmarks that allow for a comprehensive and fair comparison. In this work, we first present a fair evaluation benchmark, in which we first introduce a Real-world Low-Quality Face Video benchmark (RFV-LQ), evaluate several leading image-based face restoration algorithms, and conduct a thorough systematical analysis of the benefits and challenges associated with extending blind face image restoration algorithms to degraded face videos. Our analysis identifies several key issues, primarily categorized into two aspects: significant jitters in facial components and noise-shape flickering between frames. To address these issues, we propose a Temporal Consistency Network (TCN) cooperated with alignment smoothing to reduce jitters and flickers in restored videos. TCN is a flexible component that can be seamlessly plugged into the most advanced face image restoration algorithms, ensuring the quality of image-based restoration is maintained as closely as possible. Extensive experiments have been conducted to evaluate the effectiveness and efficiency of our proposed TCN and alignment smoothing operation. Project page: https://wzhouxiff.github.io/projects/FIR2FVR/FIR2FVR.

Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos

TL;DR

A Temporal Consistency Network (TCN) cooperated with alignment smoothing to reduce jitters and flickers in restored videos is proposed, a flexible component that can be seamlessly plugged into the most advanced face image restoration algorithms, ensuring the quality of image-based restoration is maintained as closely as possible.

Abstract

Recent progress in blind face restoration has resulted in producing high-quality restored results for static images. However, efforts to extend these advancements to video scenarios have been minimal, partly because of the absence of benchmarks that allow for a comprehensive and fair comparison. In this work, we first present a fair evaluation benchmark, in which we first introduce a Real-world Low-Quality Face Video benchmark (RFV-LQ), evaluate several leading image-based face restoration algorithms, and conduct a thorough systematical analysis of the benefits and challenges associated with extending blind face image restoration algorithms to degraded face videos. Our analysis identifies several key issues, primarily categorized into two aspects: significant jitters in facial components and noise-shape flickering between frames. To address these issues, we propose a Temporal Consistency Network (TCN) cooperated with alignment smoothing to reduce jitters and flickers in restored videos. TCN is a flexible component that can be seamlessly plugged into the most advanced face image restoration algorithms, ensuring the quality of image-based restoration is maintained as closely as possible. Extensive experiments have been conducted to evaluate the effectiveness and efficiency of our proposed TCN and alignment smoothing operation. Project page: https://wzhouxiff.github.io/projects/FIR2FVR/FIR2FVR.

Paper Structure

This paper contains 17 sections, 9 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Qualitative comparisons between video-based and image-based methods. Existing state-of-the-art image-based face restoration algorithms, such as GFP-GAN wang2021towards, RestoreFormer++ wang2023restoreformer++, and Codeformer zhou2022towards, achieve higher restoration quality than existing video-based methods, such as EDVR wang2019edvr, EDVR-GAN wang2019edvr, and BasicVSR-GAN chan2021basicvsr, especially in recovering teeth with rich details, glasses with complete structure, and eyes with a natural look. Corresponding videos are in the supplementary materials.
  • Figure 2: FID V.S. Component FIDs. While the scores of image-based methods (including GFP-GAN wang2021towards, RestoreFormer++ wang2023restoreformer++, and CodeFormer zhou2022towards) and video-based methods implemented with GAN loss (such as BasicVSR-GAN gu2022vqfr) are close in terms of FID (blue circle), the advantages of image-based methods become significant when measured with Component FIDs, especially in terms of FID-Mouth (red circle). It is more consistent with the observation in the qualitative results (refer to the high-quality restored teeth of image-based methods in Fig. \ref{['fig:fvr_fir']}), which demonstrates the accuracy of Component FIDs compared to FID.
  • Figure 3: Visualization of jitters in facial components. (b)$\sim$(g) show the concatenated vertical slices (the red vertical line in (a)) in every frame along the time by the results of existing video-based and image-based face restoration methods. Compared to the relatively smooth slices in the results of EDVR wang2019edvr, the restored results of the video-based methods with GAN gan and image-based methods exist noticeable jitters. Corresponding videos are in the supplementary materials.
  • Figure 4: Visualization of noise-shape flickers. The first row shows the restored results of existing video-based and image-based face restoration algorithms, while the second row shows their corresponding warping error map lai2018learning with the previous frame. Although the restored results of video-based methods with GAN loss gan and image-based methods achieve more details (the results of image-based methods look more natural and contain less artifacts), they exist noticeable noise-shape flickers compared to EDVR wang2019edvr without GAN loss. Corresponding videos are in the supplementary materials.
  • Figure 5: High-Quality Face Images in FFHQ ffhq and VFHQ xie2022vfhq. Both FFHQ ffhq and VFHQ xie2022vfhq provide high-quality face images beneficial for research in face image/video restoration. However, due to the inherent motion blur in videos, face images extracted from VFHQ xie2022vfhq display noticeable blur (indicated by the red arrow in the final face image). This can adversely affect blind face restoration when using VFHQ xie2022vfhq as the training dataset.
  • ...and 5 more figures