Table of Contents
Fetching ...

Adjust-free adversarial example generation in speech recognition using evolutionary multi-objective optimization under black-box condition

Shoma Ishida, Satoshi Ono

TL;DR

The paper tackles the practical vulnerability of ASR systems to adversarial perturbations in black-box settings by introducing adjust-free adversarial examples that remain effective despite timing lag. It formulates adversarial generation as a three-objective optimization problem and solves it with MOEA/D, optimizing the expected misclassification probability, its variance, and MFCC distancess under random lag $τ_i\in[-T_{max},T_{max}]$. Empirical results on a speech command model show the approach can produce robust perturbations that maintain misclassification across a range of timing differences, outperforming prior black-box methods in many classes. This work highlights a realistic threat in deployed ASR systems and demonstrates how EMO can craft robust perturbations without precise timing, informing defense strategies and safety analyses.

Abstract

This paper proposes a black-box adversarial attack method to automatic speech recognition systems. Some studies have attempted to attack neural networks for speech recognition; however, these methods did not consider the robustness of generated adversarial examples against timing lag with a target speech. The proposed method in this paper adopts Evolutionary Multi-objective Optimization (EMO)that allows it generating robust adversarial examples under black-box scenario. Experimental results showed that the proposed method successfully generated adjust-free adversarial examples, which are sufficiently robust against timing lag so that an attacker does not need to take the timing of playing it against the target speech.

Adjust-free adversarial example generation in speech recognition using evolutionary multi-objective optimization under black-box condition

TL;DR

The paper tackles the practical vulnerability of ASR systems to adversarial perturbations in black-box settings by introducing adjust-free adversarial examples that remain effective despite timing lag. It formulates adversarial generation as a three-objective optimization problem and solves it with MOEA/D, optimizing the expected misclassification probability, its variance, and MFCC distancess under random lag . Empirical results on a speech command model show the approach can produce robust perturbations that maintain misclassification across a range of timing differences, outperforming prior black-box methods in many classes. This work highlights a realistic threat in deployed ASR systems and demonstrates how EMO can craft robust perturbations without precise timing, informing defense strategies and safety analyses.

Abstract

This paper proposes a black-box adversarial attack method to automatic speech recognition systems. Some studies have attempted to attack neural networks for speech recognition; however, these methods did not consider the robustness of generated adversarial examples against timing lag with a target speech. The proposed method in this paper adopts Evolutionary Multi-objective Optimization (EMO)that allows it generating robust adversarial examples under black-box scenario. Experimental results showed that the proposed method successfully generated adjust-free adversarial examples, which are sufficiently robust against timing lag so that an attacker does not need to take the timing of playing it against the target speech.

Paper Structure

This paper contains 11 sections, 4 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Robust adversarial example generation by multi-objective optimization
  • Figure 3: Trasition on distributions of the non-dominated solutions for "down" sample.
  • Figure 4: Generated adversarial examples for class "down".
  • Figure 5: Obtained non-dominated solutions.
  • Figure 7: The effectiveness of employing standard deviation.