Table of Contents
Fetching ...

An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen

Abstract

We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.

An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

Abstract

We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.
Paper Structure (13 sections, 2 equations, 6 figures, 2 tables)

This paper contains 13 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Block diagram of the proposed probabilistic attributes based explainable framework.
  • Figure 2: Overall pipeline of the proposed architecture for explainable spoofed speech detection. Phase I demonstrates the extraction of embeddings using a countermeasure system and the subsequent processing of these embeddings through a bank of seven probabilistic feature detectors. Phase II illustrates the concatenation of the outputs from these detectors to create a $25$-dimensional vector, which is then fed into a decision tree model for classification. This decision tree model is used for both bonafide/spoofed classification and spoofing attack attribution.
  • Figure 3: Attribute characterization results of each attribute set (AS) shown in terms of EER.
  • Figure 4: The t-SNE projection of spoof CM embeddings obtained from (a) AASIST, and (b) SSL-AASIST systems, respectively.
  • Figure 5: Spoofing detection
  • ...and 1 more figures