1M-Deepfakes Detection Challenge

Zhixi Cai; Abhinav Dhall; Shreya Ghosh; Munawar Hayat; Dimitrios Kollias; Kalin Stefanov; Usman Tariq

1M-Deepfakes Detection Challenge

Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq

TL;DR

This work targets the detection and localization of deepfakes in long, multi-subject audio-visual content by leveraging the large AV-Deepfake1M dataset and launching the 1M-Deepfakes Detection Challenge. It defines two tasks—detection (binary real/fake) and temporal localization (identifying manipulated intervals)—with a rigorously designed evaluation protocol that combines $AP$, $AR$, and $AUC$ into a final score $S$ for robust benchmarking. The paper reports on dataset scale, partitioning, and baseline results, and highlights a public evaluation server to support ongoing, cross-team progress. By emphasizing localization and cross-modal content, the work enables more resilient detection of realistic yet subtly manipulated media across languages and contexts.

Abstract

The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This challenge is designed to engage the research community in developing advanced methods for detecting and localizing deepfake manipulations within the large-scale high-realistic audio-visual dataset. The participants can access the AV-Deepfake1M dataset and are required to submit their inference results for evaluation across the metrics for detection or localization tasks. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection and localization systems. Evaluation scripts, baseline models, and accompanying code will be available on https://github.com/ControlNet/AV-Deepfake1M.

1M-Deepfakes Detection Challenge

TL;DR

, and

into a final score

for robust benchmarking. The paper reports on dataset scale, partitioning, and baseline results, and highlights a public evaluation server to support ongoing, cross-team progress. By emphasizing localization and cross-modal content, the work enables more resilient detection of realistic yet subtly manipulated media across languages and contexts.

Abstract

Paper Structure (14 sections, 1 equation, 4 figures)

This paper contains 14 sections, 1 equation, 4 figures.

Introduction
Related Work
Challenge Description
Dataset
Data Partitioning
Evaluation Metrics
Deepfake Detection
Deepfake Temporal Localization
Benchmark
Challenge Participation Details
Challenge Track 1 (Detection/Classification)
Challenge Track 2 (Temporal Localization)
Research Impact
Conclusion and Future Directions

Figures (4)

Figure 1: Comparison of related datasets with AV-Deepfake1M. This figure illustrates a comparison of AV-Deepfake1M with other accessible datasets, highlighting the number of subjects and the quantity of real versus fake videos. The figure is reproduced from the AV-Deepfake1M paper.
Figure 2: Data partitioning in AV-Deepfake1M. (a) The count of subjects within the train, validation, and test sets. (b) The count of videos present in the train, validation, and test sets. The figure is adapted from the AV-Deepfake1M paper.
Figure 3: Temporal deepfake localization benchmark. This figure compares the performance of state-of-the-art methods on the AV-Deepfake1M dataset.
Figure 4: Deepfake Detection Benchmark. Comparison of state-of-the-art method performance on the AV-Deepfake1M dataset across various evaluation protocols.

1M-Deepfakes Detection Challenge

TL;DR

Abstract

1M-Deepfakes Detection Challenge

Authors

TL;DR

Abstract

Table of Contents

Figures (4)