Table of Contents
Fetching ...

FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge

Jiahe Lan, Jie Wang, Baochen Yan, Zheng Yan, Elisa Bertino

TL;DR

FlowMur addresses the vulnerability of DNN-based speech recognition to backdoor attacks under restricted adversary knowledge. It constructs a surrogate-based trigger optimization and an adaptive data-poisoning scheme with ambient-noise considerations to achieve dynamic, inaudible triggers that work across attachment positions. Empirical results show FlowMur achieves high attack performance (often >95% ASR) in both digital and physical settings while remaining stealthy to humans, and defenses like filters, fine-pruning, STRIP, and Beatrix have limited effect. The work highlights significant security implications for speech interfaces and motivates development of robust defenses that can operate under limited defender knowledge.

Abstract

Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process, such that it does not affect the model's performance on benign inputs, but forces the model to produce an adversary-desired output if a specific trigger is present in the model input. Despite the initial success of current audio backdoor attacks, they suffer from the following limitations: (i) Most of them require sufficient knowledge, which limits their widespread adoption. (ii) They are not stealthy enough, thus easy to be detected by humans. (iii) Most of them cannot attack live speech, reducing their practicality. To address these problems, in this paper, we propose FlowMur, a stealthy and practical audio backdoor attack that can be launched with limited knowledge. FlowMur constructs an auxiliary dataset and a surrogate model to augment adversary knowledge. To achieve dynamicity, it formulates trigger generation as an optimization problem and optimizes the trigger over different attachment positions. To enhance stealthiness, we propose an adaptive data poisoning method according to Signal-to-Noise Ratio (SNR). Furthermore, ambient noise is incorporated into the process of trigger generation and data poisoning to make FlowMur robust to ambient noise and improve its practicality. Extensive experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings while remaining resilient to state-of-the-art defenses. In particular, a human study confirms that triggers generated by FlowMur are not easily detected by participants.

FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge

TL;DR

FlowMur addresses the vulnerability of DNN-based speech recognition to backdoor attacks under restricted adversary knowledge. It constructs a surrogate-based trigger optimization and an adaptive data-poisoning scheme with ambient-noise considerations to achieve dynamic, inaudible triggers that work across attachment positions. Empirical results show FlowMur achieves high attack performance (often >95% ASR) in both digital and physical settings while remaining stealthy to humans, and defenses like filters, fine-pruning, STRIP, and Beatrix have limited effect. The work highlights significant security implications for speech interfaces and motivates development of robust defenses that can operate under limited defender knowledge.

Abstract

Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process, such that it does not affect the model's performance on benign inputs, but forces the model to produce an adversary-desired output if a specific trigger is present in the model input. Despite the initial success of current audio backdoor attacks, they suffer from the following limitations: (i) Most of them require sufficient knowledge, which limits their widespread adoption. (ii) They are not stealthy enough, thus easy to be detected by humans. (iii) Most of them cannot attack live speech, reducing their practicality. To address these problems, in this paper, we propose FlowMur, a stealthy and practical audio backdoor attack that can be launched with limited knowledge. FlowMur constructs an auxiliary dataset and a surrogate model to augment adversary knowledge. To achieve dynamicity, it formulates trigger generation as an optimization problem and optimizes the trigger over different attachment positions. To enhance stealthiness, we propose an adaptive data poisoning method according to Signal-to-Noise Ratio (SNR). Furthermore, ambient noise is incorporated into the process of trigger generation and data poisoning to make FlowMur robust to ambient noise and improve its practicality. Extensive experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings while remaining resilient to state-of-the-art defenses. In particular, a human study confirms that triggers generated by FlowMur are not easily detected by participants.
Paper Structure (48 sections, 9 equations, 13 figures, 11 tables, 1 algorithm)

This paper contains 48 sections, 9 equations, 13 figures, 11 tables, 1 algorithm.

Figures (13)

  • Figure 1: The workflow of FlowMur.
  • Figure 2: Experimental results on class-wise evaluation. "--" means not available. According to its definition, ASR cannot be calculated when the actual label and the target label are the same.
  • Figure 3: Experimental results on the impact of target-class poisoning rate.
  • Figure 4: Experimental results on the impact of trigger duration.
  • Figure 5: Experimental results on the impact of SNR.
  • ...and 8 more figures