Table of Contents
Fetching ...

DiffFake: Exposing Deepfakes using Differential Anomaly Detection

Sotirios Stamnas, Victor Sanchez

TL;DR

This paper introduces DiffFake, a deepfake detector that treats detection as anomaly detection by learning natural changes between two real images of the same person. A backbone trained with mask-based pseudo-deepfake augmentation produces generalizable face embeddings, which are then paired and fed into a Gaussian Mixture Model-based anomaly detector. Across multiple benchmarks, including cross-manipulation and cross-dataset settings, DiffFake achieves competitive or superior AUC scores, demonstrating strong generalization and robustness to quality variations. The work highlights the effectiveness of combining differential features from image pairs with anomaly detection to address generalization gaps in deepfake detection. It also identifies limitations with fully synthetic or text-to-video generated content and outlines directions for extending the approach to these scenarios.

Abstract

Traditional deepfake detectors have dealt with the detection problem as a binary classification task. This approach can achieve satisfactory results in cases where samples of a given deepfake generation technique have been seen during training, but can easily fail with deepfakes generated by other techniques. In this paper, we propose DiffFake, a novel deepfake detector that approaches the detection problem as an anomaly detection task. Specifically, DiffFake learns natural changes that occur between two facial images of the same person by leveraging a differential anomaly detection framework. This is done by combining pairs of deep face embeddings and using them to train an anomaly detection model. We further propose to train a feature extractor on pseudo-deepfakes with global and local artifacts, to extract meaningful and generalizable features that can then be used to train the anomaly detection model. We perform extensive experiments on five different deepfake datasets and show that our method can match and sometimes even exceed the performance of state-of-the-art competitors.

DiffFake: Exposing Deepfakes using Differential Anomaly Detection

TL;DR

This paper introduces DiffFake, a deepfake detector that treats detection as anomaly detection by learning natural changes between two real images of the same person. A backbone trained with mask-based pseudo-deepfake augmentation produces generalizable face embeddings, which are then paired and fed into a Gaussian Mixture Model-based anomaly detector. Across multiple benchmarks, including cross-manipulation and cross-dataset settings, DiffFake achieves competitive or superior AUC scores, demonstrating strong generalization and robustness to quality variations. The work highlights the effectiveness of combining differential features from image pairs with anomaly detection to address generalization gaps in deepfake detection. It also identifies limitations with fully synthetic or text-to-video generated content and outlines directions for extending the approach to these scenarios.

Abstract

Traditional deepfake detectors have dealt with the detection problem as a binary classification task. This approach can achieve satisfactory results in cases where samples of a given deepfake generation technique have been seen during training, but can easily fail with deepfakes generated by other techniques. In this paper, we propose DiffFake, a novel deepfake detector that approaches the detection problem as an anomaly detection task. Specifically, DiffFake learns natural changes that occur between two facial images of the same person by leveraging a differential anomaly detection framework. This is done by combining pairs of deep face embeddings and using them to train an anomaly detection model. We further propose to train a feature extractor on pseudo-deepfakes with global and local artifacts, to extract meaningful and generalizable features that can then be used to train the anomaly detection model. We perform extensive experiments on five different deepfake datasets and show that our method can match and sometimes even exceed the performance of state-of-the-art competitors.

Paper Structure

This paper contains 12 sections, 4 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Example of changes that occur between frames of real and fake videos. (a) corresponds to frames from a real video that exhibit a natural change between the two head poses. (b) corresponds to a deepfake video and exhibits illumination inconsistency on the left side of the face. (c) corresponds to a deepfake video and exhibits facial boundary inconsistency around the chin and jawline regions.
  • Figure 2: Visualisation of DiffFake. We input pairs of images, corresponding to the same person, to a pre-trained deep neural network to extract the corresponding deep face embeddings. The embeddings are then combined into a single vector, which is then given as input to an ADM trained only on pairs of real images. Finally, the ADM recognizes the initial input as either real or fake.
  • Figure 3: Overview of generating blended images through our proposed method. The second column shows the facial landmarks that are used for the mask generation scheme. The third column shows the resulting masks (after elastic deformation and Gaussian blurring). Finally, the last column showcases the generated pseudo-deepfakes, which contain a variety of artifacts in different parts of the face region.