Table of Contents
Fetching ...

D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen, Nhien-An Le-Khac

TL;DR

The paper investigates the resilience of the D-CAPTCHA system to transferable imperceptible adversarial attacks in the context of fake phone calls. It introduces D-CAPTCHA++ by applying PGD adversarial training to deepfake detectors and task classifiers, aiming to curb transferability-based breaches. Empirical results show that adversarial training reduces attack success rates from roughly 32% to near 0% for both detectors and task classifiers as training steps increase, indicating substantial robustness gains. The work provides practical guidance on defending voice-based deepfake detection systems and outlines avenues for extending robustness to real-world telephony conditions.

Abstract

The advancements in generative AI have enabled the improvement of audio synthesis models, including text-to-speech and voice conversion. This raises concerns about its potential misuse in social manipulation and political interference, as synthetic speech has become indistinguishable from natural human speech. Several speech-generation programs are utilized for malicious purposes, especially impersonating individuals through phone calls. Therefore, detecting fake audio is crucial to maintain social security and safeguard the integrity of information. Recent research has proposed a D-CAPTCHA system based on the challenge-response protocol to differentiate fake phone calls from real ones. In this work, we study the resilience of this system and introduce a more robust version, D-CAPTCHA++, to defend against fake calls. Specifically, we first expose the vulnerability of the D-CAPTCHA system under transferable imperceptible adversarial attack. Secondly, we mitigate such vulnerability by improving the robustness of the system by using adversarial training in D-CAPTCHA deepfake detectors and task classifiers.

D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack

TL;DR

The paper investigates the resilience of the D-CAPTCHA system to transferable imperceptible adversarial attacks in the context of fake phone calls. It introduces D-CAPTCHA++ by applying PGD adversarial training to deepfake detectors and task classifiers, aiming to curb transferability-based breaches. Empirical results show that adversarial training reduces attack success rates from roughly 32% to near 0% for both detectors and task classifiers as training steps increase, indicating substantial robustness gains. The work provides practical guidance on defending voice-based deepfake detection systems and outlines avenues for extending robustness to real-world telephony conditions.

Abstract

The advancements in generative AI have enabled the improvement of audio synthesis models, including text-to-speech and voice conversion. This raises concerns about its potential misuse in social manipulation and political interference, as synthetic speech has become indistinguishable from natural human speech. Several speech-generation programs are utilized for malicious purposes, especially impersonating individuals through phone calls. Therefore, detecting fake audio is crucial to maintain social security and safeguard the integrity of information. Recent research has proposed a D-CAPTCHA system based on the challenge-response protocol to differentiate fake phone calls from real ones. In this work, we study the resilience of this system and introduce a more robust version, D-CAPTCHA++, to defend against fake calls. Specifically, we first expose the vulnerability of the D-CAPTCHA system under transferable imperceptible adversarial attack. Secondly, we mitigate such vulnerability by improving the robustness of the system by using adversarial training in D-CAPTCHA deepfake detectors and task classifiers.
Paper Structure (22 sections, 8 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 22 sections, 8 equations, 2 figures, 9 tables, 1 algorithm.

Figures (2)

  • Figure 1: Comparision of VC's Inference Speed.
  • Figure 2: Attack Success Rate of Task classifiers and Deepfake detectors before and after applying PGD adversarial training.