Table of Contents
Fetching ...

Deepfake Detection and the Impact of Limited Computing Capabilities

Paloma Cantero-Arjona, Alfonso Sánchez-Macián

TL;DR

The paper tackles deepfake detection under limited computing resources by comparing a computationally intensive (2+1)D 3DCNN, which proved impractical, with a Vision Transformer (ViViT) approach using a Factorised Encoder 2. Across five datasets (UADFV, DeepfakeTIMIT LQ/HQ, DFDC, FaceForensics++), it demonstrates that frame sampling and face extraction pipelines can control compute, and that ViViT can achieve meaningful accuracy improvements (up to 67.56% precision on certain targets) through architecture scaling, dropout, and learning-rate scheduling. However, even with these optimizations, full automation remains out of reach under strict resource limits, suggesting ViViT as a viable first-pass filter to guide manual fact-checking. The study highlights the critical trade-off between model complexity and available hardware, and points to biometric indicators and other resource-aware approaches as promising directions for future work.

Abstract

The rapid development of technologies and artificial intelligence makes deepfakes an increasingly sophisticated and challenging-to-identify technique. To ensure the accuracy of information and control misinformation and mass manipulation, it is of paramount importance to discover and develop artificial intelligence models that enable the generic detection of forged videos. This work aims to address the detection of deepfakes across various existing datasets in a scenario with limited computing resources. The goal is to analyze the applicability of different deep learning techniques under these restrictions and explore possible approaches to enhance their efficiency.

Deepfake Detection and the Impact of Limited Computing Capabilities

TL;DR

The paper tackles deepfake detection under limited computing resources by comparing a computationally intensive (2+1)D 3DCNN, which proved impractical, with a Vision Transformer (ViViT) approach using a Factorised Encoder 2. Across five datasets (UADFV, DeepfakeTIMIT LQ/HQ, DFDC, FaceForensics++), it demonstrates that frame sampling and face extraction pipelines can control compute, and that ViViT can achieve meaningful accuracy improvements (up to 67.56% precision on certain targets) through architecture scaling, dropout, and learning-rate scheduling. However, even with these optimizations, full automation remains out of reach under strict resource limits, suggesting ViViT as a viable first-pass filter to guide manual fact-checking. The study highlights the critical trade-off between model complexity and available hardware, and points to biometric indicators and other resource-aware approaches as promising directions for future work.

Abstract

The rapid development of technologies and artificial intelligence makes deepfakes an increasingly sophisticated and challenging-to-identify technique. To ensure the accuracy of information and control misinformation and mass manipulation, it is of paramount importance to discover and develop artificial intelligence models that enable the generic detection of forged videos. This work aims to address the detection of deepfakes across various existing datasets in a scenario with limited computing resources. The goal is to analyze the applicability of different deep learning techniques under these restrictions and explore possible approaches to enhance their efficiency.
Paper Structure (12 sections, 30 figures, 7 tables)

This paper contains 12 sections, 30 figures, 7 tables.

Figures (30)

  • Figure 1: Exp. 3DCNN - UADFV
  • Figure 2: Exp. 3DCNN - UADFV
  • Figure 3: Exp. 3DCNN - DFTimit LQ
  • Figure 4: Exp. 3DCNN - DFTimit LQ
  • Figure 5: Exp. 3DCNN - DFTimit HQ
  • ...and 25 more figures