Improving Video Deepfake Detection: A DCT-Based Approach with Patch-Level Analysis
Luca Guarnera, Salvatore Manganello, Sebastiano Battiato
TL;DR
The paper tackles the challenge of robust, real-time deepfake detection by introducing a DCT-based feature extractor that leverages the Beta components of the $63$ AC coefficients from 8x8 block DCTs applied to I-frames. By conducting patch-level analysis across Regions A, the authors identify eyes and mouth as the most discriminative areas, achieving high AUC scores and demonstrating that a fully analytical, explainable approach can rival deep-learning baselines. Evaluations on FaceForensics++ and Celeb-DF (v2) show competitive performance while emphasizing computational efficiency and interpretability, enabling potential real-time deployment. The work contributes a fast, patch-aware, DCT-based detection framework that complements existing deepfake detectors with a transparent, frequency-domain perspective.
Abstract
A new algorithm for the detection of deepfakes in digital videos is presented. The I-frames were extracted in order to provide faster computation and analysis than approaches described in the literature. To identify the discriminating regions within individual video frames, the entire frame, background, face, eyes, nose, mouth, and face frame were analyzed separately. From the Discrete Cosine Transform (DCT), the Beta components were extracted from the AC coefficients and used as input to standard classifiers. Experimental results show that the eye and mouth regions are those most discriminative and able to determine the nature of the video under analysis.
