Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

Lin Zhang; Xin Wang; Erica Cooper; Mireia Diez; Federico Landini; Nicholas Evans; Junichi Yamagishi

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

Lin Zhang, Xin Wang, Erica Cooper, Mireia Diez, Federico Landini, Nicholas Evans, Junichi Yamagishi

TL;DR

This work defines Spoof Diarization for the Partial Spoof scenario, aiming to locate spoofed regions and cluster them by spoofing method to enable traceability. It introduces the Countermeasure-Condition Clustering (3C) model with separate CM-dia and CM-loc branches, and a Label-based CM-constraint to condition diarization on localization, evaluated via Spoof Jaccard error rate (JI_bona and JER_spoof) on PartialSpoof. Experiments reveal substantial task complexity even in constrained, single-speaker settings and open-set conditions, and demonstrate how labeling schemes and CM integration influence performance. The study provides a concrete benchmark, insights into CM training strategies, and open-source code to foster further research in spoof traceability for forensic and security applications.

Abstract

This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pioneering study in spoof diarization, we focus on defining the task, establishing evaluation metrics, and proposing a benchmark model, namely the Countermeasure-Condition Clustering (3C) model. Utilizing this model, we first explore how to effectively train countermeasures to support spoof diarization using three labeling schemes. We then utilize spoof localization predictions to enhance the diarization performance. This first study reveals the high complexity of the task, even in restricted scenarios where only a single speaker per audio file and an oracle number of spoofing methods are considered. Our code is available at https://github.com/nii-yamagishilab/PartialSpoof.

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 4 figures, 2 tables)

This paper contains 17 sections, 6 equations, 4 figures, 2 tables.

Introduction
Spoof Diarization
Definition
Spoof diarization and speaker diarization
Metric - Spoof Jaccard error rate
Proposed 3C Model for Spoof Diarization
3C model: CM-condition clustering
Labeling scheme
Experiments and Results
Experimental setup
How to train CMs to support spoof diarization
How do labeling schemes affect the ability of CMs?
How do we utilize CMs trained under varying labeling schemes?
Conclusion
Acknowledgements
...and 2 more sections

Figures (4)

Figure 1: Spoof detection, localization, and diarization.
Figure 2: Example of annotated class-homogeneous segments within an audio in the PS scenario.
Figure 3: Comparison of different diarization tasks. "b" for bona fide,"$A_*$" for spoofing methods, and "Spk*" for speakers. Nonspeech is omitted for clarity.
Figure 4: Proposed benchmark model and metrics for spoof diarization.

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

TL;DR

Abstract

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

Authors

TL;DR

Abstract

Table of Contents

Figures (4)