FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge
Jiahe Lan, Jie Wang, Baochen Yan, Zheng Yan, Elisa Bertino
TL;DR
FlowMur addresses the vulnerability of DNN-based speech recognition to backdoor attacks under restricted adversary knowledge. It constructs a surrogate-based trigger optimization and an adaptive data-poisoning scheme with ambient-noise considerations to achieve dynamic, inaudible triggers that work across attachment positions. Empirical results show FlowMur achieves high attack performance (often >95% ASR) in both digital and physical settings while remaining stealthy to humans, and defenses like filters, fine-pruning, STRIP, and Beatrix have limited effect. The work highlights significant security implications for speech interfaces and motivates development of robust defenses that can operate under limited defender knowledge.
Abstract
Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process, such that it does not affect the model's performance on benign inputs, but forces the model to produce an adversary-desired output if a specific trigger is present in the model input. Despite the initial success of current audio backdoor attacks, they suffer from the following limitations: (i) Most of them require sufficient knowledge, which limits their widespread adoption. (ii) They are not stealthy enough, thus easy to be detected by humans. (iii) Most of them cannot attack live speech, reducing their practicality. To address these problems, in this paper, we propose FlowMur, a stealthy and practical audio backdoor attack that can be launched with limited knowledge. FlowMur constructs an auxiliary dataset and a surrogate model to augment adversary knowledge. To achieve dynamicity, it formulates trigger generation as an optimization problem and optimizes the trigger over different attachment positions. To enhance stealthiness, we propose an adaptive data poisoning method according to Signal-to-Noise Ratio (SNR). Furthermore, ambient noise is incorporated into the process of trigger generation and data poisoning to make FlowMur robust to ambient noise and improve its practicality. Extensive experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings while remaining resilient to state-of-the-art defenses. In particular, a human study confirms that triggers generated by FlowMur are not easily detected by participants.
