Generalize Your Face Forgery Detectors: An Insertable Adaptation Module Is All You Need
Xiaotian Si, Linghui Li, Liwei Zhang, Ziduo Guo, Kaiguo Yuan, Bingyu Li, Xiaoyong Li
TL;DR
This work addresses the challenge of generalizing face forgery detectors to unseen forgeries by introducing an insertable test-time adaptation module that operates on unlabeled target data without altering the base detector. The module combines a memory bank, a learnable class prototype-based classifier, and a nearest feature calibrator to adapt predictions through self-training and nearest-neighbor consistency. Key contributions include a memory-bank mechanism with entropy-based filtering, a multi-transform prototype classifier, and a nearest-feature calibration with a consistency loss, all evaluated across multiple forgery benchmarks demonstrating notable gains in AUC and cross-dataset generalization. The approach is plug-and-play, improving robustness for various detectors and offering practical benefits for real-world deployment of face forgery detection systems.
Abstract
A plethora of face forgery detectors exist to tackle facial deepfake risks. However, their practical application is hindered by the challenge of generalizing to forgeries unseen during the training stage. To this end, we introduce an insertable adaptation module that can adapt a trained off-the-shelf detector using only online unlabeled test data, without requiring modifications to the architecture or training process. Specifically, we first present a learnable class prototype-based classifier that generates predictions from the revised features and prototypes, enabling effective handling of various forgery clues and domain gaps during online testing. Additionally, we propose a nearest feature calibrator to further improve prediction accuracy and reduce the impact of noisy pseudo-labels during self-training. Experiments across multiple datasets show that our module achieves superior generalization compared to state-of-the-art methods. Moreover, it functions as a plug-and-play component that can be combined with various detectors to enhance the overall performance.
