Can DeepFake Speech be Reliably Detected?

Hongbin Liu; Youzheng Chen; Arun Narayanan; Athula Balachandran; Pedro J. Moreno; Lun Wang

Can DeepFake Speech be Reliably Detected?

Hongbin Liu, Youzheng Chen, Arun Narayanan, Athula Balachandran, Pedro J. Moreno, Lun Wang

TL;DR

This work presents the first systematic study of such active malicious attacks against state-of-the-art open-source SSDs, and their transferability is studied from both attack effectiveness and stealthiness, using both hardcoded metrics and human ratings.

Abstract

Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud. While synthetic speech detectors (SSDs) exist to combat this, they are vulnerable to ``test domain shift", exhibiting decreased performance when audio is altered through transcoding, playback, or background noise. This vulnerability is further exacerbated by deliberate manipulation of synthetic speech aimed at deceiving detectors. This work presents the first systematic study of such active malicious attacks against state-of-the-art open-source SSDs. White-box attacks, black-box attacks, and their transferability are studied from both attack effectiveness and stealthiness, using both hardcoded metrics and human ratings. The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.

Can DeepFake Speech be Reliably Detected?

TL;DR

Abstract

Can DeepFake Speech be Reliably Detected?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)