Table of Contents
Fetching ...

1M-Deepfakes Detection Challenge

Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq

TL;DR

This work targets the detection and localization of deepfakes in long, multi-subject audio-visual content by leveraging the large AV-Deepfake1M dataset and launching the 1M-Deepfakes Detection Challenge. It defines two tasks—detection (binary real/fake) and temporal localization (identifying manipulated intervals)—with a rigorously designed evaluation protocol that combines $AP$, $AR$, and $AUC$ into a final score $S$ for robust benchmarking. The paper reports on dataset scale, partitioning, and baseline results, and highlights a public evaluation server to support ongoing, cross-team progress. By emphasizing localization and cross-modal content, the work enables more resilient detection of realistic yet subtly manipulated media across languages and contexts.

Abstract

The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This challenge is designed to engage the research community in developing advanced methods for detecting and localizing deepfake manipulations within the large-scale high-realistic audio-visual dataset. The participants can access the AV-Deepfake1M dataset and are required to submit their inference results for evaluation across the metrics for detection or localization tasks. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection and localization systems. Evaluation scripts, baseline models, and accompanying code will be available on https://github.com/ControlNet/AV-Deepfake1M.

1M-Deepfakes Detection Challenge

TL;DR

This work targets the detection and localization of deepfakes in long, multi-subject audio-visual content by leveraging the large AV-Deepfake1M dataset and launching the 1M-Deepfakes Detection Challenge. It defines two tasks—detection (binary real/fake) and temporal localization (identifying manipulated intervals)—with a rigorously designed evaluation protocol that combines , , and into a final score for robust benchmarking. The paper reports on dataset scale, partitioning, and baseline results, and highlights a public evaluation server to support ongoing, cross-team progress. By emphasizing localization and cross-modal content, the work enables more resilient detection of realistic yet subtly manipulated media across languages and contexts.

Abstract

The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This challenge is designed to engage the research community in developing advanced methods for detecting and localizing deepfake manipulations within the large-scale high-realistic audio-visual dataset. The participants can access the AV-Deepfake1M dataset and are required to submit their inference results for evaluation across the metrics for detection or localization tasks. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection and localization systems. Evaluation scripts, baseline models, and accompanying code will be available on https://github.com/ControlNet/AV-Deepfake1M.
Paper Structure (14 sections, 1 equation, 4 figures)

This paper contains 14 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Comparison of related datasets with AV-Deepfake1M. This figure illustrates a comparison of AV-Deepfake1M with other accessible datasets, highlighting the number of subjects and the quantity of real versus fake videos. The figure is reproduced from the AV-Deepfake1M paper.
  • Figure 2: Data partitioning in AV-Deepfake1M. (a) The count of subjects within the train, validation, and test sets. (b) The count of videos present in the train, validation, and test sets. The figure is adapted from the AV-Deepfake1M paper.
  • Figure 3: Temporal deepfake localization benchmark. This figure compares the performance of state-of-the-art methods on the AV-Deepfake1M dataset.
  • Figure 4: Deepfake Detection Benchmark. Comparison of state-of-the-art method performance on the AV-Deepfake1M dataset across various evaluation protocols.