Table of Contents
Fetching ...

Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes

Neta Glazer, David Chernin, Idan Achituve, Sharon Gannot, Ethan Fetaya

TL;DR

This work tackles the challenge of deepfake audio detection generalizing to unseen TTS models under limited data. It introduces ADD-GP, a few-shot adaptive framework that couples a deep embedding front-end with a Gaussian Process classifier via Deep Kernel Learning, enabling robust generalization and uncertainty-aware decisions. The authors contribute LibriFake as a benchmark, demonstrate superior few-shot and personalized detection performance, and show well-calibrated probability estimates. Together, these results advance practical defenses against evolving speech deepfake threats and provide open-source resources to support further research.

Abstract

Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to efficiently adapt to previously unseen generation models with minimal data. This paper introduces ADD-GP, a few-shot adaptive framework based on a Gaussian Process (GP) classifier for Audio Deepfake Detection (ADD). We show how the combination of a powerful deep embedding model with the Gaussian processes flexibility can achieve strong performance and adaptability. Additionally, we show this approach can also be used for personalized detection, with greater robustness to new TTS models and one-shot adaptability. To support our evaluation, a benchmark dataset is constructed for this task using new state-of-the-art voice cloning models.

Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes

TL;DR

This work tackles the challenge of deepfake audio detection generalizing to unseen TTS models under limited data. It introduces ADD-GP, a few-shot adaptive framework that couples a deep embedding front-end with a Gaussian Process classifier via Deep Kernel Learning, enabling robust generalization and uncertainty-aware decisions. The authors contribute LibriFake as a benchmark, demonstrate superior few-shot and personalized detection performance, and show well-calibrated probability estimates. Together, these results advance practical defenses against evolving speech deepfake threats and provide open-source resources to support further research.

Abstract

Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to efficiently adapt to previously unseen generation models with minimal data. This paper introduces ADD-GP, a few-shot adaptive framework based on a Gaussian Process (GP) classifier for Audio Deepfake Detection (ADD). We show how the combination of a powerful deep embedding model with the Gaussian processes flexibility can achieve strong performance and adaptability. Additionally, we show this approach can also be used for personalized detection, with greater robustness to new TTS models and one-shot adaptability. To support our evaluation, a benchmark dataset is constructed for this task using new state-of-the-art voice cloning models.

Paper Structure

This paper contains 17 sections, 3 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Calibration curves for different models. On the left is the calibration on the 10-shot experiment. On the right, calibration for personalized 5-shot VoxCeleb experiment.