Probabilistic Verification of Voice Anti-Spoofing Models

Evgeny Kushnir; Alexandr Kozodaev; Dmitrii Korzh; Mikhail Pautov; Oleg Kiriukhin; Oleg Y. Rogov

Probabilistic Verification of Voice Anti-Spoofing Models

Evgeny Kushnir, Alexandr Kozodaev, Dmitrii Korzh, Mikhail Pautov, Oleg Kiriukhin, Oleg Y. Rogov

TL;DR

PV-VASM is proposed, a probabilistic framework for verifying the robustness of voice anti-spoofing models (VASMs) and derives a theoretical upper bound on the error probability and validate the method across diverse experimental settings, demonstrating its effectiveness as a practical robustness verification tool.

Abstract

Recent advances in generative models have amplified the risk of malicious misuse of speech synthesis technologies, enabling adversaries to impersonate target speakers and access sensitive resources. Although speech deepfake detection has progressed rapidly, most existing countermeasures lack formal robustness guarantees or fail to generalize to unseen generation techniques. We propose PV-VASM, a probabilistic framework for verifying the robustness of voice anti-spoofing models (VASMs). PV-VASM estimates the probability of misclassification under text-to-speech (TTS), voice cloning (VC), and parametric signal transformations. The approach is model-agnostic and enables robustness verification against unseen speech synthesis techniques and input perturbations. We derive a theoretical upper bound on the error probability and validate the method across diverse experimental settings, demonstrating its effectiveness as a practical robustness verification tool.

Probabilistic Verification of Voice Anti-Spoofing Models

TL;DR

Abstract

Paper Structure (21 sections, 22 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 22 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Related work
Methodology
Problem setup
Description of PV-VASM
Estimation of error probability
Adaptation to generative models
TTS
Voice cloning
Experimental setup
Source model, datasets, and hyperparameters
Parametric transformations and speech generation models
Metrics
Results
Parametric input transformations
...and 6 more sections

Figures (7)

Figure 1: Dependence of PCA on $(m, n, k)$ for background noise perturbations with $\operatorname{SNR} \in [15,30]$. The confidence level is set to $\alpha=10^{-6}$. Curves sharing the same color correspond to an identical computational budget $m$, while line styles and marker types indicate variations in $n$ and $k$, respectively.
Figure 2: Dependence of PCA on $\alpha$ for background noise perturbations with $\operatorname{SNR} \in [15,30]$. $m=6000,~n=1000,~k=6$ are fixed.
Figure 3: Dependence of PCA on $(m, n, k)$ for the gain adjustment transform with $\gamma \in [-10,20]~\operatorname{dB}$. The confidence level is set to $\alpha=10^{-6}$. Curves sharing the same color correspond to the same augmentation budget $m$, while line styles and marker types indicate variations in $n$ and $k$, respectively.
Figure 4: Dependence of PCA on $\alpha$ for the gain adjustment transform with $\gamma \in [-10,20] \operatorname{dB}$. The values $m=20000,~n=1000,~k=20$ are fixed.
Figure 5: Dependence of PCA on $(m, n, k)$ for the low pass filter with the cutoff frequency $\omega_{max}$ is randomly sampled from $[2500, 3000] \operatorname{Hz}$ range. The confidence level is set to $\alpha=10^{-6}$. Curves sharing the same color correspond to the same augmentation budget $m$, while line styles and marker types indicate variations in $n$ and $k$, respectively.
...and 2 more figures

Probabilistic Verification of Voice Anti-Spoofing Models

TL;DR

Abstract

Probabilistic Verification of Voice Anti-Spoofing Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)