Table of Contents
Fetching ...

Rank-based No-reference Quality Assessment for Face Swapping

Xinghui Zhou, Wenbo Zhou, Tianyi Wei, Shen Chen, Taiping Yao, Shouhong Ding, Weiming Zhang, Nenghai Yu

TL;DR

This work tackles the absence of reliable no-reference quality assessment for face-swapped images by introducing a rank-based NR-IQA framework. It builds a large-scale dataset of manipulated faces and derives millions of ranking pseudo-labels from multi-attribute consistency (expression, lighting, pose) and perceptual similarity, learned through a Siamese network with a margin-aware ranking loss $L(x_1,x_2;\theta)$. The method delivers both coarse- and fine-grained quality judgments and demonstrates state-of-the-art alignment with human judgments, while also serving as a loss term to improve existing face-swapping models. The approach enables robust, reference-free evaluation and improved generation quality in practical face-swapping applications.

Abstract

Face swapping has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swapping methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately assessing the quality of face interchange in reference-free scenarios. In this study, we present a novel no-reference image quality assessment (NR-IQA) method specifically designed for face swapping, addressing this issue by constructing a comprehensive large-scale dataset, implementing a method for ranking image quality based on multiple facial attributes, and incorporating a Siamese network based on interpretable qualitative comparisons. Our model demonstrates the state-of-the-art performance in the quality assessment of swapped faces, providing coarse- and fine-grained. Enhanced by this metric, an improved face-swapping model achieved a more advanced level with respect to expressions and poses. Extensive experiments confirm the superiority of our method over existing general no-reference image quality assessment metrics and the latest metric of facial image quality assessment, making it well suited for evaluating face swapping images in real-world scenarios.

Rank-based No-reference Quality Assessment for Face Swapping

TL;DR

This work tackles the absence of reliable no-reference quality assessment for face-swapped images by introducing a rank-based NR-IQA framework. It builds a large-scale dataset of manipulated faces and derives millions of ranking pseudo-labels from multi-attribute consistency (expression, lighting, pose) and perceptual similarity, learned through a Siamese network with a margin-aware ranking loss . The method delivers both coarse- and fine-grained quality judgments and demonstrates state-of-the-art alignment with human judgments, while also serving as a loss term to improve existing face-swapping models. The approach enables robust, reference-free evaluation and improved generation quality in practical face-swapping applications.

Abstract

Face swapping has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swapping methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately assessing the quality of face interchange in reference-free scenarios. In this study, we present a novel no-reference image quality assessment (NR-IQA) method specifically designed for face swapping, addressing this issue by constructing a comprehensive large-scale dataset, implementing a method for ranking image quality based on multiple facial attributes, and incorporating a Siamese network based on interpretable qualitative comparisons. Our model demonstrates the state-of-the-art performance in the quality assessment of swapped faces, providing coarse- and fine-grained. Enhanced by this metric, an improved face-swapping model achieved a more advanced level with respect to expressions and poses. Extensive experiments confirm the superiority of our method over existing general no-reference image quality assessment metrics and the latest metric of facial image quality assessment, making it well suited for evaluating face swapping images in real-world scenarios.
Paper Structure (16 sections, 19 equations, 5 figures, 7 tables)

This paper contains 16 sections, 19 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Example of utilizing the proposed quality assessment metric as an additional loss constraint to improve the quality of face-swapping. The term 'w/o QL' refers to training the swapping model with the original face-swapping loss function, while the term 'w/ QL' refers to adding the proposed metric as an additional loss during training. The introduction of quality metric helps to maintain the gaze and expression of the target face while ensuring consistency of identity.
  • Figure 2: Illustration of the three face-swapping related metrics and our metrics. (a) Full-reference quality assessment (FR-IQA) methods require the source and target faces to provide enough evaluations. (b) No-reference quality assessment (NR-IQA) methods require only exchanged faces, making it difficult to assess distortion due to specific identity embedding in face swapping. (c) Face IQA (FIQA) methods evaluates the identity of a face based on a face recognition task and cannot accurately evaluate synthetic images. (d) Our rank-based, no-reference quality assessment method ranks multiple swapped faces with the same target and different sources. By training consistency ranking rules that preserve attributes, we perform a new NR-IQA quality assessment.
  • Figure 3: The pipeline of rank-based labels generation. The following three components make up the generation of labels: 3D Face Reconstruction for Expression & lighting Vector, Pose Vector Estimation and Label Generation.
  • Figure 4: The comparison to human judgment, images come from DFGC-VRA peng2023dfgc. Relative to other approaches, our method demonstrates greater consistency with human evaluations in assessing the image quality of fake faces. "NR-IQA" refers to MUSIQ ke2021musiq, "Face IQA" denotes SDD-FIQA ou2021sdd, and "Generate IQA" represents KNN-GIQA gu2020giqa.
  • Figure 5: Qualitative comparison on FaceForensics++ rossler2019faceforensics++. Our model is capable of achieving precise face swapping while preserving target attributes such as expressions and poses. More results can be found in the supplementary materials.