1M-Deepfakes Detection Challenge
Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq
TL;DR
This work targets the detection and localization of deepfakes in long, multi-subject audio-visual content by leveraging the large AV-Deepfake1M dataset and launching the 1M-Deepfakes Detection Challenge. It defines two tasks—detection (binary real/fake) and temporal localization (identifying manipulated intervals)—with a rigorously designed evaluation protocol that combines $AP$, $AR$, and $AUC$ into a final score $S$ for robust benchmarking. The paper reports on dataset scale, partitioning, and baseline results, and highlights a public evaluation server to support ongoing, cross-team progress. By emphasizing localization and cross-modal content, the work enables more resilient detection of realistic yet subtly manipulated media across languages and contexts.
Abstract
The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This challenge is designed to engage the research community in developing advanced methods for detecting and localizing deepfake manipulations within the large-scale high-realistic audio-visual dataset. The participants can access the AV-Deepfake1M dataset and are required to submit their inference results for evaluation across the metrics for detection or localization tasks. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection and localization systems. Evaluation scripts, baseline models, and accompanying code will be available on https://github.com/ControlNet/AV-Deepfake1M.
