A Large-scale Universal Evaluation Benchmark For Face Forgery Detection
Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Song, Mingli Song, Zunlei Feng
TL;DR
This work introduces DeepFaceGen, the first large-scale, versatile benchmark for face forgery detection that jointly covers localized editing and full-image generation across image and video modalities. By assembling over 350k forged images and 423k forged videos produced with 34 generation techniques, along with authentic samples, the authors evaluate 13 mainstream forgery detectors to analyze performance, generalization, and feature representations. Key findings include the importance of detailed feature extraction and high-frequency cues, the greater challenge posed by localized edits, and contrasting generalization patterns between full-image and localized forgery data. The dataset and analyses aim to accelerate robust, generalizable forgery detection and to guide future directions like dynamic benchmarking and self-evolving defense strategies that can keep pace with rapid advances in generative technologies.
Abstract
With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important factors such as content diversity, fairness across ethnicities, and availability of comprehensive labels, in order to ensure the versatility and convenience of DeepFaceGen. Subsequently, DeepFaceGen is employed in this study to evaluate and analyze the performance of 13 mainstream face forgery detection techniques from various perspectives. Through extensive experimental analysis, we derive significant findings and propose potential directions for future research. The code and dataset for DeepFaceGen are available at https://github.com/HengruiLou/DeepFaceGen.
