STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Anton Firc, Manasi Chhibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, Kamil Malinka
TL;DR
This paper introduces STOPA, a systematically varied, metadata-rich dataset designed for open-world source tracing and attribution of deepfake audio. STOPA enables controlled variation across multiple acoustic models (AMs) and vocoders (VMs) using the VCTK corpus and an ASVspoof2019 LA-like protocol to support scalable, attack-disjoint evaluation. The authors frame source tracing as an open-world detection task and demonstrate baseline zero-shot attribution experiments, revealing substantial challenges with high EERs and non-discriminative embedding spaces. By providing a public benchmark and code, STOPA aims to spur development of dynamic attack-signature databases and robust attribution methods for forensic and transparency applications.
Abstract
A key research area in deepfake speech detection is source tracing - determining the origin of synthesised utterances. The approaches may involve identifying the acoustic model (AM), vocoder model (VM), or other generation-specific parameters. However, progress is limited by the lack of a dedicated, systematically curated dataset. To address this, we introduce STOPA, a systematically varied and metadata-rich dataset for deepfake speech source tracing, covering 8 AMs, 6 VMs, and diverse parameter settings across 700k samples from 13 distinct synthesisers. Unlike existing datasets, which often feature limited variation or sparse metadata, STOPA provides a systematically controlled framework covering a broader range of generative factors, such as the choice of the vocoder model, acoustic model, or pretrained weights, ensuring higher attribution reliability. This control improves attribution accuracy, aiding forensic analysis, deepfake detection, and generative model transparency.
