Table of Contents
Fetching ...

Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition

Nanjun Zhou, Weilin Lin, Li Liu

TL;DR

This work addresses backdoor vulnerabilities in automatic speech recognition by introducing Gradient Norm-based Fine-Tuning (GN-FT), a model-level defense tailored for the audio domain. GN-FT adds a gradient-norm regularization term to the training objective, using a Hessian-approximation to keep computation practical, and targets backdoored and hybrid neurons by reducing their high-gradient activity. Through extensive experiments on two speech datasets with five models and seven attacks, GN-FT achieves state-of-the-art defense performance, significantly lowering attack success rates while preserving clean accuracy. The approach demonstrates the need for audio-specific defenses and offers a practical, effective tool for securing ASR systems against backdoor threats with real-world impact for developers and researchers.

Abstract

Backdoor attacks have posed a significant threat to the security of deep neural networks (DNNs). Despite considerable strides in developing defenses against backdoor attacks in the visual domain, the specialized defenses for the audio domain remain empty. Furthermore, the defenses adapted from the visual to audio domain demonstrate limited effectiveness. To fill this gap, we propose Gradient Norm-based FineTuning (GN-FT), a novel defense strategy against the attacks in the audio domain, based on the observation from the corresponding backdoored models. Specifically, we first empirically find that the backdoored neurons exhibit greater gradient values compared to other neurons, while clean neurons stay the lowest. On this basis, we fine-tune the backdoored model by incorporating the gradient norm regularization, aiming to weaken and reduce the backdoored neurons. We further approximate the loss computation for lower implementation costs. Extensive experiments on two speech recognition datasets across five models demonstrate the superior performance of our proposed method. To the best of our knowledge, this work is the first specialized and effective defense against backdoor attacks in the audio domain.

Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition

TL;DR

This work addresses backdoor vulnerabilities in automatic speech recognition by introducing Gradient Norm-based Fine-Tuning (GN-FT), a model-level defense tailored for the audio domain. GN-FT adds a gradient-norm regularization term to the training objective, using a Hessian-approximation to keep computation practical, and targets backdoored and hybrid neurons by reducing their high-gradient activity. Through extensive experiments on two speech datasets with five models and seven attacks, GN-FT achieves state-of-the-art defense performance, significantly lowering attack success rates while preserving clean accuracy. The approach demonstrates the need for audio-specific defenses and offers a practical, effective tool for securing ASR systems against backdoor threats with real-world impact for developers and researchers.

Abstract

Backdoor attacks have posed a significant threat to the security of deep neural networks (DNNs). Despite considerable strides in developing defenses against backdoor attacks in the visual domain, the specialized defenses for the audio domain remain empty. Furthermore, the defenses adapted from the visual to audio domain demonstrate limited effectiveness. To fill this gap, we propose Gradient Norm-based FineTuning (GN-FT), a novel defense strategy against the attacks in the audio domain, based on the observation from the corresponding backdoored models. Specifically, we first empirically find that the backdoored neurons exhibit greater gradient values compared to other neurons, while clean neurons stay the lowest. On this basis, we fine-tune the backdoored model by incorporating the gradient norm regularization, aiming to weaken and reduce the backdoored neurons. We further approximate the loss computation for lower implementation costs. Extensive experiments on two speech recognition datasets across five models demonstrate the superior performance of our proposed method. To the best of our knowledge, this work is the first specialized and effective defense against backdoor attacks in the audio domain.

Paper Structure

This paper contains 13 sections, 5 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of our proposed method (GN-FT).
  • Figure 2: Illustration of gradients of different neurons over 50 clean samples. We used Audio BadNets gu2019badnets and JingleBack koffas2023going on ResNet he2016deep for illustrations. For most clean inputs, backdoored neurons and hybrid neurons exhibit larger gradients, while clean neurons show smaller gradients.
  • Figure 3: A scatter plot showing the BLC and CLC values for neurons in the last two convolutional layers of an audio-backdoored model attacked by Audio BadNets gu2019badnets. C-zone: Clean Zone; B-zone: Backdoor Zone; H-zone: Hybrid Zone; R-zone: Redundant Zone.
  • Figure 4: The BLC-CLC distribution of neurons after GN-FT against Audio BadNets.
  • Figure 5: The t-SNE plots before and after GN-FT against BadNets using SCD-10. Black points indicate the poisoned features.