Table of Contents
Fetching ...

Clean Label Attacks against SLU Systems

Henry Li Xinyuan, Sonal Joshi, Thomas Thebaud, Jesus Villalba, Najim Dehak, Sanjeev Khudanpur

TL;DR

This paper studies clean-label poisoning backdoors in spoken language understanding (SLU) systems, extending clean-label backdoor attacks to audio sequence tasks and introducing a ranked CLBD variant that selects poisoned samples based on proxy-model difficulty. Using Fluent Speech Commands and an RNN-T SLU model, the authors demonstrate that ranked CLBD can achieve an Attack Success Rate (ASR) of 99.3% with only 1.5% of eligible samples poisoned, while trigger strength and insertion location critically influence outcomes. They also compare Dirty Label Backdoor Attacks (DLBD) and CLBD, and evaluate two gradient-based defenses (filtering and denoising), finding filtering to be more effective but not universally foolproof. The results reveal practical security risks for SLU systems and motivate domain-specific defenses and robust training strategies to mitigate backdoor vulnerabilities in audio tasks.

Abstract

Poisoning backdoor attacks involve an adversary manipulating the training data to induce certain behaviors in the victim model by inserting a trigger in the signal at inference time. We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models that support/perform a Spoken Language Understanding task, achieving 99.8% attack success rate by poisoning 10% of the training data. We analyzed how varying the signal-strength of the poison, percent of samples poisoned, and choice of trigger impact the attack. We also found that CLBD attacks are most successful when applied to training samples that are inherently hard for a proxy model. Using this strategy, we achieved an attack success rate of 99.3% by poisoning a meager 1.5% of the training data. Finally, we applied two previously developed defenses against gradient-based attacks, and found that they attain mixed success against poisoning.

Clean Label Attacks against SLU Systems

TL;DR

This paper studies clean-label poisoning backdoors in spoken language understanding (SLU) systems, extending clean-label backdoor attacks to audio sequence tasks and introducing a ranked CLBD variant that selects poisoned samples based on proxy-model difficulty. Using Fluent Speech Commands and an RNN-T SLU model, the authors demonstrate that ranked CLBD can achieve an Attack Success Rate (ASR) of 99.3% with only 1.5% of eligible samples poisoned, while trigger strength and insertion location critically influence outcomes. They also compare Dirty Label Backdoor Attacks (DLBD) and CLBD, and evaluate two gradient-based defenses (filtering and denoising), finding filtering to be more effective but not universally foolproof. The results reveal practical security risks for SLU systems and motivate domain-specific defenses and robust training strategies to mitigate backdoor vulnerabilities in audio tasks.

Abstract

Poisoning backdoor attacks involve an adversary manipulating the training data to induce certain behaviors in the victim model by inserting a trigger in the signal at inference time. We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models that support/perform a Spoken Language Understanding task, achieving 99.8% attack success rate by poisoning 10% of the training data. We analyzed how varying the signal-strength of the poison, percent of samples poisoned, and choice of trigger impact the attack. We also found that CLBD attacks are most successful when applied to training samples that are inherently hard for a proxy model. Using this strategy, we achieved an attack success rate of 99.3% by poisoning a meager 1.5% of the training data. Finally, we applied two previously developed defenses against gradient-based attacks, and found that they attain mixed success against poisoning.
Paper Structure (21 sections, 1 equation, 2 figures, 5 tables)

This paper contains 21 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Classification of attacks against neural networks
  • Figure 2: Illustration of dirty label, and unranked and ranked clean label poisoning attacks. In the ranked version, the poisoned utterances are selected based on the difficulty of misclassification while in the unranked version, they are selected at random.