Deepfake detection by exploiting surface anomalies: the SurFake approach
Andrea Ciamarra, Roberto Caldelli, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo
TL;DR
SurFake addresses deepfake detection by exploiting acquisition-time surface geometry through the Global Surface Descriptor (GSD), derived from UpRightNet per-pixel surface frames $f F(i)=[f n(i),f t(i),f b(i)]$ with local/global representations $f F^c(i)$ and $f F^g(i)$ and alignment via $f f_z^g(i)$ to estimate camera orientation. The method fuses RGB face crops with a 3-channel GSD to form a 6-channel input and trains CNNs to distinguish real from fake content, achieving around 0.75 accuracy with GSD alone and modest improvements when RGB is combined (RGB+GSD). Experiments on FaceForensics++ across five forgery types demonstrate that GSD captures discriminative geometric cues, with ROC-AUC values near 0.85 for several backbones, validating the approach's generality. The work highlights the practical value of incorporating acquisition geometry into deepfake detection, offering a complementary signal to traditional RGB-based methods and guiding future exploration of larger crops and additional geometric features.
Abstract
The ever-increasing use of synthetically generated content in different sectors of our everyday life, one for all media information, poses a strong need for deepfake detection tools in order to avoid the proliferation of altered messages. The process to identify manipulated content, in particular images and videos, is basically performed by looking for the presence of some inconsistencies and/or anomalies specifically due to the fake generation process. Different techniques exist in the scientific literature that exploit diverse ad-hoc features in order to highlight possible modifications. In this paper, we propose to investigate how deepfake creation can impact on the characteristics that the whole scene had at the time of the acquisition. In particular, when an image (video) is captured the overall geometry of the scene (e.g. surfaces) and the acquisition process (e.g. illumination) determine a univocal environment that is directly represented by the image pixel values; all these intrinsic relations are possibly changed by the deepfake generation process. By resorting to the analysis of the characteristics of the surfaces depicted in the image it is possible to obtain a descriptor usable to train a CNN for deepfake detection: we refer to such an approach as SurFake. Experimental results carried out on the FF++ dataset for different kinds of deepfake forgeries and diverse deep learning models confirm that such a feature can be adopted to discriminate between pristine and altered images; furthermore, experiments witness that it can also be combined with visual data to provide a certain improvement in terms of detection accuracy.
