Table of Contents
Fetching ...

Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs

Rao Ma, Mengjie Qian, Vyas Raina, Mark Gales, Kate Knill

TL;DR

The paper addresses the vulnerability of speech LLMs to universal acoustic adversarial attacks by learning a fixed audio segment that, when prepended, can mute outputs or override prompts. It introduces general attacks (mute and task control) and a novel selective attack that activates only for inputs with targeted attributes such as gender or language. Evaluations on two leading speech LLMs (Qwen2-Audio and Granite-Speech) across LibriSpeech and FLEURS demonstrate high attack success and transferability across prompts and datasets, with perceptual concealment considerations through amplitude-constrained variants. The findings underscore significant robustness gaps and emphasize the need for defense mechanisms and robust training strategies to ensure safe, fair, and reliable deployment of speech LLMs in real-world applications.

Abstract

The combination of pre-trained speech encoders with large language models has enabled the development of speech LLMs that can handle a wide range of spoken language processing tasks. While these models are powerful and flexible, this very flexibility may make them more vulnerable to adversarial attacks. To examine the extent of this problem, in this work we investigate universal acoustic adversarial attacks on speech LLMs. Here a fixed, universal, adversarial audio segment is prepended to the original input audio. We initially investigate attacks that cause the model to either produce no output or to perform a modified task overriding the original prompt. We then extend the nature of the attack to be selective so that it activates only when specific input attributes, such as a speaker gender or spoken language, are present. Inputs without the targeted attribute should be unaffected, allowing fine-grained control over the model outputs. Our findings reveal critical vulnerabilities in Qwen2-Audio and Granite-Speech and suggest that similar speech LLMs may be susceptible to universal adversarial attacks. This highlights the need for more robust training strategies and improved resistance to adversarial attacks.

Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs

TL;DR

The paper addresses the vulnerability of speech LLMs to universal acoustic adversarial attacks by learning a fixed audio segment that, when prepended, can mute outputs or override prompts. It introduces general attacks (mute and task control) and a novel selective attack that activates only for inputs with targeted attributes such as gender or language. Evaluations on two leading speech LLMs (Qwen2-Audio and Granite-Speech) across LibriSpeech and FLEURS demonstrate high attack success and transferability across prompts and datasets, with perceptual concealment considerations through amplitude-constrained variants. The findings underscore significant robustness gaps and emphasize the need for defense mechanisms and robust training strategies to ensure safe, fair, and reliable deployment of speech LLMs in real-world applications.

Abstract

The combination of pre-trained speech encoders with large language models has enabled the development of speech LLMs that can handle a wide range of spoken language processing tasks. While these models are powerful and flexible, this very flexibility may make them more vulnerable to adversarial attacks. To examine the extent of this problem, in this work we investigate universal acoustic adversarial attacks on speech LLMs. Here a fixed, universal, adversarial audio segment is prepended to the original input audio. We initially investigate attacks that cause the model to either produce no output or to perform a modified task overriding the original prompt. We then extend the nature of the attack to be selective so that it activates only when specific input attributes, such as a speaker gender or spoken language, are present. Inputs without the targeted attribute should be unaffected, allowing fine-grained control over the model outputs. Our findings reveal critical vulnerabilities in Qwen2-Audio and Granite-Speech and suggest that similar speech LLMs may be susceptible to universal adversarial attacks. This highlights the need for more robust training strategies and improved resistance to adversarial attacks.

Paper Structure

This paper contains 34 sections, 8 equations, 3 figures, 13 tables.

Figures (3)

  • Figure 1: Illustration of language-based selective attack.
  • Figure 2: Cumulative average output length ratio (relative to ASR references) for Mute-female and Mute-male models, plotted against Qwen-Audio's zero-shot gender classification probabilities using $\mathcal{P}_{\text{gdr}}$.
  • Figure 3: Mel spectrograms of universal adversarial segments prepended to a sample from the test_other set, shown for attacks with different amplitude constraints.