Learning Expressive And Generalizable Motion Features For Face Forgery Detection
Jingyi Zhang, Peng Zhang, Jingjing Wang, Di Xie, Shiliang Pu
TL;DR
The paper tackles the vulnerability of frame-based face forgery detectors by leveraging sequence-level motion cues. It introduces a Motion Consistency Block ($MCB$) to encode inter-frame motion coherence and an Anomaly Detection (AD) auxiliary block to regularize motion-feature learning, combining outputs as $F^{S} + F^{M} + F^{MM}$. Across FF++/DFO/Celeb-DF benchmarks, the approach achieves strong cross-domain generalization and state-of-the-art results, notably improving performance under unseen manipulations and distortions. The work demonstrates that targeted motion modeling plus anomaly-driven regularization yields robust, transferable forgery detection suitable for real-world deployment.
Abstract
Previous face forgery detection methods mainly focus on appearance features, which may be easily attacked by sophisticated manipulation. Considering the majority of current face manipulation methods generate fake faces based on a single frame, which do not take frame consistency and coordination into consideration, artifacts on frame sequences are more effective for face forgery detection. However, current sequence-based face forgery detection methods use general video classification networks directly, which discard the special and discriminative motion information for face manipulation detection. To this end, we propose an effective sequence-based forgery detection framework based on an existing video classification method. To make the motion features more expressive for manipulation detection, we propose an alternative motion consistency block instead of the original motion features module. To make the learned features more generalizable, we propose an auxiliary anomaly detection block. With these two specially designed improvements, we make a general video classification network achieve promising results on three popular face forgery datasets.
