Table of Contents
Fetching ...

SpMis: An Investigation of Synthetic Spoken Misinformation Detection

Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, Zhizheng Wu

TL;DR

An initial investigation into synthetic spoken misinformation detection is conducted by introducing an open-source dataset, SpMis, which includes speech synthesized from over 1,000 speakers across five common topics, utilizing state-of-the-art text-to-speech systems.

Abstract

In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also raised concerns about the misuse of this technology, particularly for generating synthetic misinformation. Current research primarily focuses on distinguishing machine-generated speech from human-produced speech, but the more urgent challenge is detecting misinformation within spoken content. This task requires a thorough analysis of factors such as speaker identity, topic, and synthesis. To address this need, we conduct an initial investigation into synthetic spoken misinformation detection by introducing an open-source dataset, SpMis. SpMis includes speech synthesized from over 1,000 speakers across five common topics, utilizing state-of-the-art text-to-speech systems. Although our results show promising detection capabilities, they also reveal substantial challenges for practical implementation, underscoring the importance of ongoing research in this critical area.

SpMis: An Investigation of Synthetic Spoken Misinformation Detection

TL;DR

An initial investigation into synthetic spoken misinformation detection is conducted by introducing an open-source dataset, SpMis, which includes speech synthesized from over 1,000 speakers across five common topics, utilizing state-of-the-art text-to-speech systems.

Abstract

In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also raised concerns about the misuse of this technology, particularly for generating synthetic misinformation. Current research primarily focuses on distinguishing machine-generated speech from human-produced speech, but the more urgent challenge is detecting misinformation within spoken content. This task requires a thorough analysis of factors such as speaker identity, topic, and synthesis. To address this need, we conduct an initial investigation into synthetic spoken misinformation detection by introducing an open-source dataset, SpMis. SpMis includes speech synthesized from over 1,000 speakers across five common topics, utilizing state-of-the-art text-to-speech systems. Although our results show promising detection capabilities, they also reveal substantial challenges for practical implementation, underscoring the importance of ongoing research in this critical area.
Paper Structure (16 sections, 3 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: A comparison between DeepFake detection and synthetic spoken misinformation detection. DeepFake detection (left) is to distinguish synthetic and recording. On the other hand, synthetic spoken misinformation detection (right) is to detect synthetic speech by a specific speaker or a group of speaker on specific topics.
  • Figure 2: Overview of the detection pipeline. Deepfake Detection checks the synthetic audio and sends it to Speaker Verification. Speaker Verification verifies the celebrities we focus on and sends them to Topic Classification. Topic Classification tells the specific topic. Misinformation is detected through these three modules.
  • Figure 3: The speaker error rate of two TTS models in Speaker Verification.