Table of Contents
Fetching ...

Deepfake detection by exploiting surface anomalies: the SurFake approach

Andrea Ciamarra, Roberto Caldelli, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

TL;DR

SurFake addresses deepfake detection by exploiting acquisition-time surface geometry through the Global Surface Descriptor (GSD), derived from UpRightNet per-pixel surface frames $f F(i)=[f n(i),f t(i),f b(i)]$ with local/global representations $f F^c(i)$ and $f F^g(i)$ and alignment via $f f_z^g(i)$ to estimate camera orientation. The method fuses RGB face crops with a 3-channel GSD to form a 6-channel input and trains CNNs to distinguish real from fake content, achieving around 0.75 accuracy with GSD alone and modest improvements when RGB is combined (RGB+GSD). Experiments on FaceForensics++ across five forgery types demonstrate that GSD captures discriminative geometric cues, with ROC-AUC values near 0.85 for several backbones, validating the approach's generality. The work highlights the practical value of incorporating acquisition geometry into deepfake detection, offering a complementary signal to traditional RGB-based methods and guiding future exploration of larger crops and additional geometric features.

Abstract

The ever-increasing use of synthetically generated content in different sectors of our everyday life, one for all media information, poses a strong need for deepfake detection tools in order to avoid the proliferation of altered messages. The process to identify manipulated content, in particular images and videos, is basically performed by looking for the presence of some inconsistencies and/or anomalies specifically due to the fake generation process. Different techniques exist in the scientific literature that exploit diverse ad-hoc features in order to highlight possible modifications. In this paper, we propose to investigate how deepfake creation can impact on the characteristics that the whole scene had at the time of the acquisition. In particular, when an image (video) is captured the overall geometry of the scene (e.g. surfaces) and the acquisition process (e.g. illumination) determine a univocal environment that is directly represented by the image pixel values; all these intrinsic relations are possibly changed by the deepfake generation process. By resorting to the analysis of the characteristics of the surfaces depicted in the image it is possible to obtain a descriptor usable to train a CNN for deepfake detection: we refer to such an approach as SurFake. Experimental results carried out on the FF++ dataset for different kinds of deepfake forgeries and diverse deep learning models confirm that such a feature can be adopted to discriminate between pristine and altered images; furthermore, experiments witness that it can also be combined with visual data to provide a certain improvement in terms of detection accuracy.

Deepfake detection by exploiting surface anomalies: the SurFake approach

TL;DR

SurFake addresses deepfake detection by exploiting acquisition-time surface geometry through the Global Surface Descriptor (GSD), derived from UpRightNet per-pixel surface frames with local/global representations and and alignment via to estimate camera orientation. The method fuses RGB face crops with a 3-channel GSD to form a 6-channel input and trains CNNs to distinguish real from fake content, achieving around 0.75 accuracy with GSD alone and modest improvements when RGB is combined (RGB+GSD). Experiments on FaceForensics++ across five forgery types demonstrate that GSD captures discriminative geometric cues, with ROC-AUC values near 0.85 for several backbones, validating the approach's generality. The work highlights the practical value of incorporating acquisition geometry into deepfake detection, offering a complementary signal to traditional RGB-based methods and guiding future exploration of larger crops and additional geometric features.

Abstract

The ever-increasing use of synthetically generated content in different sectors of our everyday life, one for all media information, poses a strong need for deepfake detection tools in order to avoid the proliferation of altered messages. The process to identify manipulated content, in particular images and videos, is basically performed by looking for the presence of some inconsistencies and/or anomalies specifically due to the fake generation process. Different techniques exist in the scientific literature that exploit diverse ad-hoc features in order to highlight possible modifications. In this paper, we propose to investigate how deepfake creation can impact on the characteristics that the whole scene had at the time of the acquisition. In particular, when an image (video) is captured the overall geometry of the scene (e.g. surfaces) and the acquisition process (e.g. illumination) determine a univocal environment that is directly represented by the image pixel values; all these intrinsic relations are possibly changed by the deepfake generation process. By resorting to the analysis of the characteristics of the surfaces depicted in the image it is possible to obtain a descriptor usable to train a CNN for deepfake detection: we refer to such an approach as SurFake. Experimental results carried out on the FF++ dataset for different kinds of deepfake forgeries and diverse deep learning models confirm that such a feature can be adopted to discriminate between pristine and altered images; furthermore, experiments witness that it can also be combined with visual data to provide a certain improvement in terms of detection accuracy.
Paper Structure (15 sections, 6 figures, 2 tables)

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An example of surface anomalies found in fake images. From left to right: the RGB (Red-Green-Blue) face, our proposed GSD (Global Surface Descriptor) feature and the logarithm of the GSD, used here for sake of visualization to highlight the artifacts introduced by the manipulation.
  • Figure 2: Pipeline of SurFake for deepfake detection. After extracting the face crop from the image, we generate its Global Surface Descriptor (GSD) through UpRightNet xian2019uprightnet and we scale the generated vector values in $[0,255]$ to obtain an RGB image. Then, we concatenate the face crop and the GSD feature at the last channel and we pass it in input to a classifier. Finally, we train the classifier to distinguish whether the content is real or fake.
  • Figure 3: Sample frames (first row) and the corresponding Global Surface Descriptors (second row) and $\log (GSD)$ (third row) for each of the 5 different forgeries in FF++, from left to right: Real, DF, F2F, FSH, FS, NT rossler2019faceforensics++. The third row highlights how GSD is sensitive to forgeries.
  • Figure 4: T-SNE van2008visualizing plots of the GSD feature activations for real and fake samples of the test set for each of the different forgeries (MobilNetV2 architecture). Only a reduced number of samples is plotted for the sake of visibility.
  • Figure 5: ROC Curve of GSD features for real and fake using MobileNetV2 as classifier. We also reported the Area Under Curve (AUC) for each forgery.
  • ...and 1 more figures