DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, Guanbin Li
TL;DR
DeepShield tackles the cross-domain generalization gap in deepfake video detection by jointly leveraging local patch-level cues and global forgery representations. It extends CLIP-ViT with Local Patch Guidance (LPG) and Global Forgery Diversification (GFD), and uses Spatiotemporal Artifact Modeling (SAM) to generate labeled local data, while Domain Feature Augmentation (DFA) and Boundary-Expanding Feature Generation (BFG) diversify global features. The training objective combines patch-level supervision with a cross-entropy and supervised contrastive loss, formalized as $\mathcal{L}^{\text{overall}} = \omega \mathcal{L}_{\text{LPG}} + \mathcal{L}_{\text{GFD}}$, and incorporates a global representation $f_v = \frac{1}{T} \sum_{t=1}^T f^{\text{cls}}_{v,t}$. Empirical results on FF++ HQ and unseen datasets show that DeepShield achieves superior cross-dataset and cross-manipulation performance, demonstrating strong generalization and potential for robust real-world deepfake detection.
Abstract
Recent advances in deep generative models have made it easier to manipulate face videos, raising significant concerns about their potential misuse for fraud and misinformation. Existing detectors often perform well in in-domain scenarios but fail to generalize across diverse manipulation techniques due to their reliance on forgery-specific artifacts. In this work, we introduce DeepShield, a novel deepfake detection framework that balances local sensitivity and global generalization to improve robustness across unseen forgeries. DeepShield enhances the CLIP-ViT encoder through two key components: Local Patch Guidance (LPG) and Global Forgery Diversification (GFD). LPG applies spatiotemporal artifact modeling and patch-wise supervision to capture fine-grained inconsistencies often overlooked by global models. GFD introduces domain feature augmentation, leveraging domain-bridging and boundary-expanding feature generation to synthesize diverse forgeries, mitigating overfitting and enhancing cross-domain adaptability. Through the integration of novel local and global analysis for deepfake detection, DeepShield outperforms state-of-the-art methods in cross-dataset and cross-manipulation evaluations, achieving superior robustness against unseen deepfake attacks.
