Table of Contents
Fetching ...

Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

Ivan Kukanov, Janne Laakkonen, Tomi Kinnunen, Ville Hautamäki

TL;DR

This work addresses the problem of speech deepfake detection from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available, and demonstrates an improvement in the Equal Error Rate.

Abstract

Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.

Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

TL;DR

This work addresses the problem of speech deepfake detection from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available, and demonstrates an improvement in the Equal Error Rate.

Abstract

Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.

Paper Structure

This paper contains 13 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: T-SNE visualization of model embeddings, before and after few-shot ProtoMAML adaptation on FakeAVCeleb Hasam2021FakeAVCeleb corpus.
  • Figure 2: ProtoNet intuition: during training the optimal centroinds are learned for every subset of tasks, the adaptation to a new tasks is done with few-shots to compute prototypical vectors of new tasks. For testing, the distance to the nearest centroid is computed to known class.
  • Figure 3: Explore ProtoNet few-shot adaptation with 2-256 shots per class. Horizontal axis is log-scaled.
  • Figure 4: Explore ProtoMAML few-shot adaptation with 2-96 shots per class. Horizontal axis is log-scaled.
  • Figure 5: Explore ProtoMAML_96 adaptation steps on 96-shots per class. Horizontal axis is log-scaled.