SynthForensics: A Multi-Generator Benchmark for Detecting Synthetic Video Deepfakes
Roberto Leotta, Salvatore Alfio Sambataro, Claudio Vittorio Ragaglia, Mirko Casu, Yuri Petralia, Francesco Guarnera, Luca Guarnera, Sebastiano Battiato
TL;DR
SynthForensics introduces a first human-centric benchmark for purely synthetic video deepfakes, leveraging a paired-source protocol across five open-source T2V models to produce 6,815 high-quality samples with four compression variants. The study reveals current detectors struggle dramatically in zero-shot settings and under compression on synthetic content, while fine-tuning and generator-based training yield strong forward generalization within synthetic domains but poor backward transfer to legacy manipulation-based benchmarks. The work provides extensive metadata, prompts, and rigorous validation pipelines to enable reproducibility and further research into robust, generalizable detection methods. It highlights the need for detectors tailored to generation artifacts and contextualized evaluation to safeguard multimedia authenticity in the era of accessible high-fidelity synthesis.
Abstract
The landscape of synthetic media has been irrevocably altered by text-to-video (T2V) models, whose outputs are rapidly approaching indistinguishability from reality. Critically, this technology is no longer confined to large-scale labs; the proliferation of efficient, open-source generators is democratizing the ability to create high-fidelity synthetic content on consumer-grade hardware. This makes existing face-centric and manipulation-based benchmarks obsolete. To address this urgent threat, we introduce SynthForensics, to the best of our knowledge the first human-centric benchmark for detecting purely synthetic video deepfakes. The benchmark comprises 6,815 unique videos from five architecturally distinct, state-of-the-art open-source T2V models. Its construction was underpinned by a meticulous two-stage, human-in-the-loop validation to ensure high semantic and visual quality. Each video is provided in four versions (raw, lossless, light, and heavy compression) to enable real-world robustness testing. Experiments demonstrate that state-of-the-art detectors are both fragile and exhibit limited generalization when evaluated on this new domain: we observe a mean performance drop of $29.19\%$ AUC, with some methods performing worse than random chance, and top models losing over 30 points under heavy compression. The paper further investigates the efficacy of training on SynthForensics as a means to mitigate these observed performance gaps, achieving robust generalization to unseen generators ($93.81\%$ AUC), though at the cost of reduced backward compatibility with traditional manipulation-based deepfakes. The complete dataset and all generation metadata, including the specific prompts and inference parameters for every video, will be made publicly available at [link anonymized for review].
