Leveraging Unlabeled Data from Unknown Sources via Dual-Path Guidance for Deepfake Face Detection
Zhiqiang Yang, Renshuai Tao, Chunjie Zhang, guodong yang, Xiaolong Zheng, Yao Zhao
TL;DR
The paper tackles deepfake detection under realistic conditions with vast unlabeled data from unknown sources, a setting where traditional supervised methods struggle due to subtle domain shifts and overlapping real/fake semantics. It introduces DPGNet, a dual-path framework that combines text-guided cross-domain alignment with curriculum-driven pseudo-label generation to leverage unlabeled data while preserving knowledge from labeled data. Key innovations include learnable text prompts aligned via CLIP, cross-domain feature enhancement, a dynamic curriculum for pseudo-labeling, and latent-space augmentation with cross-domain distillation, all trained end-to-end. Experiments across datasets such as FF++, DFDC, CelebDF, and DF40 show that DPGNet consistently outperforms state-of-the-art baselines in cross-dataset and cross-method evaluations, demonstrating robust generalization to unseen forgery techniques and practical scalability for real-world deployment.
Abstract
Existing deepfake detection methods heavily rely on static labeled datasets. However, with the proliferation of generative models, real-world scenarios are flooded with massive amounts of unlabeled fake face data from unknown sources. This presents a critical dilemma: detectors relying solely on existing data face generalization failure, while manual labeling for this new stream is infeasible due to the high realism of fakes. A more fundamental challenge is that, unlike typical unsupervised learning tasks where categories are clearly defined, real and fake faces share the same semantics, which leads to a decline in the performance of traditional unsupervised strategies. Therefore, there is an urgent need for a new paradigm designed specifically for this scenario to effectively utilize these unlabeled data. Accordingly, this paper proposes a dual-path guided network (DPGNet) to address two key challenges: (1) bridging the domain differences between faces generated by different generative models; and (2) utilizing unlabeled image samples. The method comprises two core modules: text-guided cross-domain alignment, which uses learnable cues to unify visual and textual embeddings into a domain-invariant feature space; and curriculum-driven pseudo-label generation, which dynamically utilizes unlabeled samples. Extensive experiments on multiple mainstream datasets show that DPGNet significantly outperforms existing techniques,, highlighting its effectiveness in addressing the challenges posed by the deepfakes using unlabeled data.
