Table of Contents
Fetching ...

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

Hieu-Thi Luong, Haoyang Li, Lin Zhang, Kong Aik Lee, Eng Siong Chng

TL;DR

This work addresses the vulnerability of countermeasures to attacker-driven fake speech by introducing LlamaPartialSpoof, a 130-hour dataset of fully and partially fake utterances generated with an LLM and diverse TTS models. It analyzes how current CMs generalize to unseen, attacker-inspired scenarios and reveals that domain shifts yield substantial performance gaps, with a best reported EER of $24.49\%$. The authors provide a detailed generation pipeline, multi-dataset evaluations, and insights into how concatenation methods and fake-word proportions affect detection, offering a practical framework to improveCM robustness against realistic disinformation attacks. Overall, the dataset and findings highlight the need for domain-adaptive detection strategies and more realistic attack simulations to strengthen defenses against disinformation campaigns.

Abstract

Previous fake speech datasets were constructed from a defender's perspective to develop countermeasure (CM) systems without considering diverse motivations of attackers. To better align with real-life scenarios, we created LlamaPartialSpoof, a 130-hour dataset that contains both fully and partially fake speech, using a large language model (LLM) and voice cloning technologies to evaluate the robustness of CMs. By examining valuable information for both attackers and defenders, we identify several key vulnerabilities in current CM systems, which can be exploited to enhance attack success rates, including biases toward certain text-to-speech models or concatenation methods. Our experimental results indicate that the current fake speech detection system struggle to generalize to unseen scenarios, achieving a best performance of 24.49% equal error rate.

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

TL;DR

This work addresses the vulnerability of countermeasures to attacker-driven fake speech by introducing LlamaPartialSpoof, a 130-hour dataset of fully and partially fake utterances generated with an LLM and diverse TTS models. It analyzes how current CMs generalize to unseen, attacker-inspired scenarios and reveals that domain shifts yield substantial performance gaps, with a best reported EER of . The authors provide a detailed generation pipeline, multi-dataset evaluations, and insights into how concatenation methods and fake-word proportions affect detection, offering a practical framework to improveCM robustness against realistic disinformation attacks. Overall, the dataset and findings highlight the need for domain-adaptive detection strategies and more realistic attack simulations to strengthen defenses against disinformation campaigns.

Abstract

Previous fake speech datasets were constructed from a defender's perspective to develop countermeasure (CM) systems without considering diverse motivations of attackers. To better align with real-life scenarios, we created LlamaPartialSpoof, a 130-hour dataset that contains both fully and partially fake speech, using a large language model (LLM) and voice cloning technologies to evaluate the robustness of CMs. By examining valuable information for both attackers and defenders, we identify several key vulnerabilities in current CM systems, which can be exploited to enhance attack success rates, including biases toward certain text-to-speech models or concatenation methods. Our experimental results indicate that the current fake speech detection system struggle to generalize to unseen scenarios, achieving a best performance of 24.49% equal error rate.
Paper Structure (16 sections, 2 figures, 5 tables)

This paper contains 16 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Examples of sentences altered by the Llama 3 Instruct model.
  • Figure 2: Histograms illustrate number of fake sentences that have a given percentage of fake words or fake segments.