Table of Contents
Fetching ...

A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems?

Yuanda Wang, Qiben Yan, Nikolay Ivanov, Xun Chen

TL;DR

This survey analyzes AI-driven threats to commercial voice control systems (VCS), evaluating six attack types across multiple devices and APIs to assess real-world resilience. It provides a taxonomy of attacks, deploys real-world experiments on Siri, Google Assistant, Alexa, and Bixby, and examines defenses such as liveness detection and purification. The findings show that modern VCS are more robust to over-the-air attacks than previously believed, though vulnerabilities persist in deepfake voice attacks, unintelligible speech, and certain adversarial approaches. The work offers practical guidance for defense design and identifies critical gaps—particularly in cross-device transferability and realistic threat models—that warrant further research and robust vulnerability mitigation. This has significant implications for security engineering in VCS-enabled environments and IoT ecosystems.

Abstract

The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks. Through extensive experimentation, we evaluate six prominent attack techniques across a collection of voice control interfaces and devices. Contrary to prevailing narratives, our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats. Particularly, our research highlights the ineffectiveness of white-box attacks in black-box scenarios. Furthermore, the adversaries encounter substantial obstacles in obtaining precise gradient estimations during query-based interactions with commercial systems, such as Apple Siri and Samsung Bixby. Meanwhile, we find that current defense strategies are not completely immune to advanced attacks. Our findings contribute valuable insights for enhancing defense mechanisms in VCS. Through this survey, we aim to raise awareness within the academic community about the security concerns of VCS and advocate for continued research in this crucial area.

A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems?

TL;DR

This survey analyzes AI-driven threats to commercial voice control systems (VCS), evaluating six attack types across multiple devices and APIs to assess real-world resilience. It provides a taxonomy of attacks, deploys real-world experiments on Siri, Google Assistant, Alexa, and Bixby, and examines defenses such as liveness detection and purification. The findings show that modern VCS are more robust to over-the-air attacks than previously believed, though vulnerabilities persist in deepfake voice attacks, unintelligible speech, and certain adversarial approaches. The work offers practical guidance for defense design and identifies critical gaps—particularly in cross-device transferability and realistic threat models—that warrant further research and robust vulnerability mitigation. This has significant implications for security engineering in VCS-enabled environments and IoT ecosystems.

Abstract

The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks. Through extensive experimentation, we evaluate six prominent attack techniques across a collection of voice control interfaces and devices. Contrary to prevailing narratives, our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats. Particularly, our research highlights the ineffectiveness of white-box attacks in black-box scenarios. Furthermore, the adversaries encounter substantial obstacles in obtaining precise gradient estimations during query-based interactions with commercial systems, such as Apple Siri and Samsung Bixby. Meanwhile, we find that current defense strategies are not completely immune to advanced attacks. Our findings contribute valuable insights for enhancing defense mechanisms in VCS. Through this survey, we aim to raise awareness within the academic community about the security concerns of VCS and advocate for continued research in this crucial area.
Paper Structure (29 sections, 5 equations, 8 figures, 10 tables)

This paper contains 29 sections, 5 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Speaker Verification (SV) framework.
  • Figure 2: Automatic Speech Recognition (ASR) framework.
  • Figure 3: A framework of VCS workflow and potential vulnerabilities.
  • Figure 4: Voice synthesis frameworks.
  • Figure 5: Signal transformation approaches for generating unintelligible speech.
  • ...and 3 more figures