When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

Haoyu Liu; Chaoyu Gong; Mengke He; Jiate Li; Kai Han; Siqiang Luo

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

Haoyu Liu, Chaoyu Gong, Mengke He, Jiate Li, Kai Han, Siqiang Luo

TL;DR

SSTGNN introduces a unified Spatial-Spectral-Temporal Graph Neural Network for deepfake video detection, modeling videos as patch-level graphs and applying learnable spectral filters on the graph Laplacian $L = I - D^{-1/2} A D^{-1/2}$ with eigen-decomposition $L = U \, diag(\lambda) \, U^T$. It integrates spatial and temporal inconsistencies via negative edges and uses a dual GAT backbone to fuse spatial and temporal signals, yielding a compact model that achieves state-of-the-art performance with up to $42\times$ fewer parameters. The method demonstrates strong in-domain and cross-domain generalization across diverse benchmarks, while offering efficient training, inference, and memory usage suitable for resource-constrained deployment. Interpretability analyses confirm that SSTGNN leverages frame-level spectral cues and localized attention to detect subtle forgery artifacts, providing a principled, graph-based perspective on manipulation traces. Overall, SSTGNN provides a scalable, interpretable, and efficient framework for robust deepfake detection with potential extensions to broader video forensics tasks.

Abstract

The proliferation of generative video models has made detecting AI-generated and manipulated videos an urgent challenge. Existing detection approaches often fail to generalize across diverse manipulation types due to their reliance on isolated spatial, temporal, or spectral information, and typically require large models to perform well. This paper introduces SSTGNN, a lightweight Spatial-Spectral-Temporal Graph Neural Network framework that represents videos as structured graphs, enabling joint reasoning over spatial inconsistencies, temporal artifacts, and spectral distortions. SSTGNN incorporates learnable spectral filters and spatial-temporal differential modeling into a unified graph-based architecture, capturing subtle manipulation traces more effectively. Extensive experiments on diverse benchmark datasets demonstrate that SSTGNN not only achieves superior performance in both in-domain and cross-domain settings, but also offers strong efficiency and resource allocation. Remarkably, SSTGNN accomplishes these results with up to 42$\times$ fewer parameters than state-of-the-art models, making it highly lightweight and resource-friendly for real-world deployment.

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

TL;DR

with eigen-decomposition

. It integrates spatial and temporal inconsistencies via negative edges and uses a dual GAT backbone to fuse spatial and temporal signals, yielding a compact model that achieves state-of-the-art performance with up to

fewer parameters. The method demonstrates strong in-domain and cross-domain generalization across diverse benchmarks, while offering efficient training, inference, and memory usage suitable for resource-constrained deployment. Interpretability analyses confirm that SSTGNN leverages frame-level spectral cues and localized attention to detect subtle forgery artifacts, providing a principled, graph-based perspective on manipulation traces. Overall, SSTGNN provides a scalable, interpretable, and efficient framework for robust deepfake detection with potential extensions to broader video forensics tasks.

Abstract

fewer parameters than state-of-the-art models, making it highly lightweight and resource-friendly for real-world deployment.

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

TL;DR

Abstract

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (1)