Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

Zheng Fang; Tao Wang; Lingchen Zhao; Shenyi Zhang; Bowen Li; Yunjie Ge; Qi Li; Chao Shen; Qian Wang

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

TL;DR

This work addresses the practicality gap in attacking black-box ASR systems by eliminating the need for target queries. It introduces ZQ-Attack, a zero-query, transfer-based attack that optimizes adversarial perturbations across a diverse set of surrogate ASRs through a sequential ensemble protocol and an adaptive perturbation initialization. The method leverages a novel loss combining adversarial effectiveness, perceptual imperceptibility, and acoustic-feature alignment, along with psychoacoustic-based clipping to maintain stealth. Empirical results show 100% SRoA across multiple online services, open-source ASRs, andcommercial IVC devices, with competitive or superior SNRs compared to prior black-box methods, highlighting the real-world risk and the need for robust defenses.

Abstract

In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting. Through a comprehensive review and categorization of modern ASR technologies, we first meticulously select surrogate ASRs of diverse types to generate adversarial examples. Following this, ZQ-Attack initializes the adversarial perturbation with a scaled target command audio, rendering it relatively imperceptible while maintaining effectiveness. Subsequently, to achieve high transferability of adversarial perturbations, we propose a sequential ensemble optimization algorithm, which iteratively optimizes the adversarial perturbation on each surrogate model, leveraging collaborative information from other models. We conduct extensive experiments to evaluate ZQ-Attack. In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB on 4 online speech recognition services, and attains an average SRoA of 100% and SNR of 19.67dB on 16 open-source ASRs. For commercial intelligent voice control devices, ZQ-Attack also achieves a 100% SRoA with an average SNR of 15.77dB in the over-the-air setting.

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

TL;DR

Abstract

Paper Structure (33 sections, 12 equations, 9 figures, 13 tables, 2 algorithms)

This paper contains 33 sections, 12 equations, 9 figures, 13 tables, 2 algorithms.

Introduction
Background
Automatic Speech Recognition
Audio Adversarial Attacks
Threat Model & Challenges
Threat Model
Challenges
ZQ-Attack
Problem Formulation
Attack Overview
Surrogate ASRs Selection
Perturbation Initialization
Sequential Ensemble Optimization
Loss Design
Experiments
...and 18 more sections

Figures (9)

Figure 1: The architecture of a typical ASR system.
Figure 2: Workflow of ZQ-Attack. ZQ-Attack is mainly divided into three stages: surrogate ASRs selection, perturbation initialization, and sequential ensemble optimization.
Figure 3: Illustration of surrogate ASRs selection.
Figure 4: An example of the perturbation initialization. The adversarial perturbation is initialized using a scaled target command audio. The region of the added initialized adversarial perturbation is highlighted in red.
Figure 5: Illustration of sequential ensemble optimization.
...and 4 more figures

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

TL;DR

Abstract

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (9)