Table of Contents
Fetching ...

Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms

Ethan Wilson, Frederick Shic, Sophie Jörg, Eakta Jain

TL;DR

The paper investigates why face swaps evoke uncanny perceptions, identifying eye-region fidelity as a critical factor typically underemphasized by standard reconstruction losses. It introduces a modular gaze-centric loss that leverages a pretrained gaze estimator to constrain gaze reconstruction during training, alongside an eye-region pixel/mask loss. Empirical results show significant improvements in gaze accuracy over baselines and demonstrate that gaze-focused training can alter how viewers use eyes to detect deepfakes, though uncanniness effects remain nuanced. Practically, the approach is compatible with existing pipelines like DeepFaceLab and holds promise for enhancing gaze realism and improving gaze-based deepfake detection datasets, while highlighting ethical considerations and limitations in reducing perceived uncanniness. The work lays groundwork for gaze-aware synthesis and points to future gains with higher-resolution gaze predictors and broader perceptual testing.

Abstract

Advances in face swapping have enabled the automatic generation of highly realistic faces. Yet face swaps are perceived differently than when looking at real faces, with key differences in viewer behavior surrounding the eyes. Face swapping algorithms generally place no emphasis on the eyes, relying on pixel or feature matching losses that consider the entire face to guide the training process. We further investigate viewer perception of face swaps, focusing our analysis on the presence of an uncanny valley effect. We additionally propose a novel loss equation for the training of face swapping models, leveraging a pretrained gaze estimation network to directly improve representation of the eyes. We confirm that viewed face swaps do elicit uncanny responses from viewers. Our proposed improvements significant reduce viewing angle errors between face swaps and their source material. Our method additionally reduces the prevalence of the eyes as a deciding factor when viewers perform deepfake detection tasks. Our findings have implications on face swapping for special effects, as digital avatars, as privacy mechanisms, and more; negative responses from users could limit effectiveness in said applications. Our gaze improvements are a first step towards alleviating negative viewer perceptions via a targeted approach.

Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms

TL;DR

The paper investigates why face swaps evoke uncanny perceptions, identifying eye-region fidelity as a critical factor typically underemphasized by standard reconstruction losses. It introduces a modular gaze-centric loss that leverages a pretrained gaze estimator to constrain gaze reconstruction during training, alongside an eye-region pixel/mask loss. Empirical results show significant improvements in gaze accuracy over baselines and demonstrate that gaze-focused training can alter how viewers use eyes to detect deepfakes, though uncanniness effects remain nuanced. Practically, the approach is compatible with existing pipelines like DeepFaceLab and holds promise for enhancing gaze realism and improving gaze-based deepfake detection datasets, while highlighting ethical considerations and limitations in reducing perceived uncanniness. The work lays groundwork for gaze-aware synthesis and points to future gains with higher-resolution gaze predictors and broader perceptual testing.

Abstract

Advances in face swapping have enabled the automatic generation of highly realistic faces. Yet face swaps are perceived differently than when looking at real faces, with key differences in viewer behavior surrounding the eyes. Face swapping algorithms generally place no emphasis on the eyes, relying on pixel or feature matching losses that consider the entire face to guide the training process. We further investigate viewer perception of face swaps, focusing our analysis on the presence of an uncanny valley effect. We additionally propose a novel loss equation for the training of face swapping models, leveraging a pretrained gaze estimation network to directly improve representation of the eyes. We confirm that viewed face swaps do elicit uncanny responses from viewers. Our proposed improvements significant reduce viewing angle errors between face swaps and their source material. Our method additionally reduces the prevalence of the eyes as a deciding factor when viewers perform deepfake detection tasks. Our findings have implications on face swapping for special effects, as digital avatars, as privacy mechanisms, and more; negative responses from users could limit effectiveness in said applications. Our gaze improvements are a first step towards alleviating negative viewer perceptions via a targeted approach.
Paper Structure (33 sections, 7 equations, 7 figures, 3 tables)

This paper contains 33 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A selection of stimuli from FF++ DFD used to evaluate uncanniness. Top = face swaps; Bottom = real faces.
  • Figure 2: Histogram showing the distribution of participant responses between original videos and face swapped videos. A higher score represents a higher overall uncanniness measurement.
  • Figure 3: Illustration of DFL's LIAE architecture. The pathway taken to create the resulting face swap is displayed in red. Note that $z^{AB}_{char}$ is concatenated with a copy of itself to reconstruct the character face, and $z^{AB}_{orig}$ is concatenated with itself to produce the face swap result.
  • Figure 4: Design diagram of the steps to compute the gaze reconstruction loss.
  • Figure 5: Visual comparison of face swaps produced by the baseline DFL method, DFL with eyes and mouth priority loss (em), and DFL with our proposed loss (gaze). Both improvements over the baseline reduce gaze angle error.
  • ...and 2 more figures