Table of Contents
Fetching ...

Exposing DeepFake Videos By Detecting Face Warping Artifacts

Yuezun Li, Siwei Lyu

TL;DR

This work targets DeepFake video detection by exploiting artifacts from affine face warping, a byproduct of synthesizing fixed-size face images. A CNN-based detector is trained using negative examples generated via simple image processing to simulate warping artifacts, enabling data-efficient learning without requiring DeepFake-generated negatives. The method, evaluated on UADFV and DeepfakeTIMIT, achieves state-of-the-art AUC scores—particularly with ResNet50—demonstrating robust detection across diverse sources. The approach offers practical, scalable detection that can generalize beyond a single DeepFake source, with future work aimed at improving compression robustness and exploring specialized network designs.

Abstract

In this work, we describe a new deep learning based method that can effectively distinguish AI-generated fake videos (referred to as {\em DeepFake} videos hereafter) from real videos. Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video. Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks (CNNs). Compared to previous methods which use a large amount of real and DeepFake generated images to train CNN classifier, our method does not need DeepFake generated images as negative training examples since we target the artifacts in affine face warping as the distinctive feature to distinguish real and fake images. The advantages of our method are two-fold: (1) Such artifacts can be simulated directly using simple image processing operations on a image to make it as negative example. Since training a DeepFake model to generate negative examples is time-consuming and resource-demanding, our method saves a plenty of time and resources in training data collection; (2) Since such artifacts are general existed in DeepFake videos from different sources, our method is more robust compared to others. Our method is evaluated on two sets of DeepFake video datasets for its effectiveness in practice.

Exposing DeepFake Videos By Detecting Face Warping Artifacts

TL;DR

This work targets DeepFake video detection by exploiting artifacts from affine face warping, a byproduct of synthesizing fixed-size face images. A CNN-based detector is trained using negative examples generated via simple image processing to simulate warping artifacts, enabling data-efficient learning without requiring DeepFake-generated negatives. The method, evaluated on UADFV and DeepfakeTIMIT, achieves state-of-the-art AUC scores—particularly with ResNet50—demonstrating robust detection across diverse sources. The approach offers practical, scalable detection that can generalize beyond a single DeepFake source, with future work aimed at improving compression robustness and exploring specialized network designs.

Abstract

In this work, we describe a new deep learning based method that can effectively distinguish AI-generated fake videos (referred to as {\em DeepFake} videos hereafter) from real videos. Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video. Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks (CNNs). Compared to previous methods which use a large amount of real and DeepFake generated images to train CNN classifier, our method does not need DeepFake generated images as negative training examples since we target the artifacts in affine face warping as the distinctive feature to distinguish real and fake images. The advantages of our method are two-fold: (1) Such artifacts can be simulated directly using simple image processing operations on a image to make it as negative example. Since training a DeepFake model to generate negative examples is time-consuming and resource-demanding, our method saves a plenty of time and resources in training data collection; (2) Since such artifacts are general existed in DeepFake videos from different sources, our method is more robust compared to others. Our method is evaluated on two sets of DeepFake video datasets for its effectiveness in practice.

Paper Structure

This paper contains 8 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview of the DeepFake production pipeline. (a) An image of the source. (b) Green box is the detected face area. (c) Red points are face landmarks. (d) Transform matrix is computed to warp face area in (e) to the normalized region (f). (g) Synthesized face image from the neural network. (h) Synthesized face warped back using the same transform matrix. (i) Post-processing including boundary smoothing applied to the composite image. (g) The final synthesized image.
  • Figure 2: Overview of negative data generation. (a) is the original image. (b) are aligned faces with different scales. We randomly pick a scale of face in (b) and apply Gaussian blur as (c), which is then affine warped back to (d).
  • Figure 3: Illustration of face shape augmentation of negative examples. (a) is the aligned and blurred face, which then undergoes an affine warped back to (b). (c, d) are post-processing for refining the shape of face area. (c) denotes the whole warped face is retained and (d) denotes only face area inside the polygon is retained.
  • Figure 4: Performance of each CNN model on all frames of UADFV yang2018exposing.
  • Figure 5: Performance of each CNN model on each video of UADFV yang2018exposing.
  • ...and 3 more figures