Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories
Zongmei Chen, Xin Liao, Xiaoshuai Wu, Yanxiang Chen
TL;DR
This work tackles deepfake detection under real-world video compression by introducing a framework that builds 3D spatiotemporal features from robust landmark localization and decouples facial expressions from head motion. It then analyzes phase-space motion trajectories with a lightweight Transformer, consolidating outputs via Dempster-Shafer evidence fusion. The approach demonstrates strong robustness to compression and competitive performance on uncompressed data, outperforming several state-of-the-art methods on multiple public benchmarks. The methodology emphasizes practical deployment with high efficiency and resilience to head pose and lighting variations, addressing real-world detection needs. Overall, it advances compressed-video deepfake detection through 3D-motion modeling and global temporal analysis, offering substantial practical impact for social platforms and security applications.
Abstract
The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable for real-world scenarios. In this paper, we propose a deepfake video detection method based on 3D spatiotemporal trajectories. Specifically, we utilize a robust 3D model to construct spatiotemporal motion features, integrating feature details from both 2D and 3D frames to mitigate the influence of large head rotation angles or insufficient lighting within frames. Furthermore, we separate facial expressions from head movements and design a sequential analysis method based on phase space motion trajectories to explore the feature differences between genuine and fake faces in deepfake videos. We conduct extensive experiments to validate the performance of our proposed method on several compressed deepfake benchmarks. The robustness of the well-designed features is verified by calculating the consistent distribution of facial landmarks before and after video compression.Our method yields satisfactory results and showcases its potential for practical applications.
