Table of Contents
Fetching ...

Improving Video Deepfake Detection: A DCT-Based Approach with Patch-Level Analysis

Luca Guarnera, Salvatore Manganello, Sebastiano Battiato

TL;DR

The paper tackles the challenge of robust, real-time deepfake detection by introducing a DCT-based feature extractor that leverages the Beta components of the $63$ AC coefficients from 8x8 block DCTs applied to I-frames. By conducting patch-level analysis across Regions A, the authors identify eyes and mouth as the most discriminative areas, achieving high AUC scores and demonstrating that a fully analytical, explainable approach can rival deep-learning baselines. Evaluations on FaceForensics++ and Celeb-DF (v2) show competitive performance while emphasizing computational efficiency and interpretability, enabling potential real-time deployment. The work contributes a fast, patch-aware, DCT-based detection framework that complements existing deepfake detectors with a transparent, frequency-domain perspective.

Abstract

A new algorithm for the detection of deepfakes in digital videos is presented. The I-frames were extracted in order to provide faster computation and analysis than approaches described in the literature. To identify the discriminating regions within individual video frames, the entire frame, background, face, eyes, nose, mouth, and face frame were analyzed separately. From the Discrete Cosine Transform (DCT), the Beta components were extracted from the AC coefficients and used as input to standard classifiers. Experimental results show that the eye and mouth regions are those most discriminative and able to determine the nature of the video under analysis.

Improving Video Deepfake Detection: A DCT-Based Approach with Patch-Level Analysis

TL;DR

The paper tackles the challenge of robust, real-time deepfake detection by introducing a DCT-based feature extractor that leverages the Beta components of the AC coefficients from 8x8 block DCTs applied to I-frames. By conducting patch-level analysis across Regions A, the authors identify eyes and mouth as the most discriminative areas, achieving high AUC scores and demonstrating that a fully analytical, explainable approach can rival deep-learning baselines. Evaluations on FaceForensics++ and Celeb-DF (v2) show competitive performance while emphasizing computational efficiency and interpretability, enabling potential real-time deployment. The work contributes a fast, patch-aware, DCT-based detection framework that complements existing deepfake detectors with a transparent, frequency-domain perspective.

Abstract

A new algorithm for the detection of deepfakes in digital videos is presented. The I-frames were extracted in order to provide faster computation and analysis than approaches described in the literature. To identify the discriminating regions within individual video frames, the entire frame, background, face, eyes, nose, mouth, and face frame were analyzed separately. From the Discrete Cosine Transform (DCT), the Beta components were extracted from the AC coefficients and used as input to standard classifiers. Experimental results show that the eye and mouth regions are those most discriminative and able to determine the nature of the video under analysis.
Paper Structure (4 sections, 5 equations, 4 figures, 1 table)

This paper contains 4 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Videos in the Faceforensics++ dataset manipulated with respect to the techniques of (a) Faceswap, (b) Face2Face, (c) Face Shifter, (d) DeepFakes, (e) DeepFake Detection, (f) Neural textures.
  • Figure 2: Examples of real (a) and manipulated (b) videos in the Celeb-DF dataset (v2).
  • Figure 3: Proposed approach: (a) For each patch in A of the I-frames of the video $V$, the DCT is calculated and the $\beta$ components of the 63 AC coefficients are extracted. (b) The final feature vectors $p^{V_a}$ of video V, are used in the various classifiers to solve the Real Vs Deepfake task and identify the most discriminative regions.
  • Figure 4: Values of AUC metric (%) across classifiers and regions under analysis.