Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Sung-Feng Huang; Heng-Cheng Kuo; Zhehuai Chen; Xuesong Yang; Chao-Han Huck Yang; Yu Tsao; Yu-Chiang Frank Wang; Hung-yi Lee; Szu-Wei Fu

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu

TL;DR

To foster spoofing detection research, the Speech INfilling Edit (SINE) dataset is introduced, created with Voicebox, and experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization across different edit methods.

Abstract

Neural speech editing advancements have raised concerns about their misuse in spoofing attacks. Traditional partially edited speech corpora primarily focus on cut-and-paste edits, which, while maintaining speaker consistency, often introduce detectable discontinuities. Recent methods, like A\textsuperscript{3}T and Voicebox, improve transitions by leveraging contextual information. To foster spoofing detection research, we introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. We detailed the process of re-implementing Voicebox training and dataset creation. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization across different edit methods. The dataset and related models will be made publicly available.

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

TL;DR

Abstract

Paper Structure (20 sections, 3 figures, 5 tables)

This paper contains 20 sections, 3 figures, 5 tables.

Introduction
Speech editing methods
Cut-and-paste (CaP) speech editing
Seamless speech editing
Potential risk of seamless speech edit
Speech INfilling Edit (SINE) dataset
Re-implementation of Voicebox
Audio type settings
Transcript editing
Dataset generation pipeline
Subjective evaluation of edited speech
SINE dataset statistics and demo files
Experiments
Partial-fake speech detectors
Experimental setup
...and 5 more sections

Figures (3)

Figure 1: Cut-and-paste and seamless speech editing.
Figure 2: Subjectivev Scores of different speech edit methods.
Figure 3: Instruction of the subjective evaluation.

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

TL;DR

Abstract

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Authors

TL;DR

Abstract

Table of Contents

Figures (3)