Table of Contents
Fetching ...

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers

Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu

TL;DR

This work addresses the vulnerability of Audio Large Language Models (ALLMs) to acoustic backdoor attacks. It introduces Hidden in the Noise (HIN), a framework that implants triggers through acoustic modifications and additive sounds, and AudioSafe, a nine-type benchmark for rigorous resilience evaluation. Across three representative ALLMs, HIN achieves high attack success rates (ASR) with minimal poisoning (as low as $\rho=3\%$) while preserving benign accuracy, revealing strong vulnerabilities in audio encoding. The study also analyzes defense methods, showing partial mitigation with trade-offs in model utility, underscoring the need for robust, modality-aware safety mechanisms in ALLMs. Overall, the work highlights urgent security gaps in audio alignment and offers a standardized evaluation platform for future defenses and architectural improvements.

Abstract

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio's distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM's acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate. (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack's stealth.

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers

TL;DR

This work addresses the vulnerability of Audio Large Language Models (ALLMs) to acoustic backdoor attacks. It introduces Hidden in the Noise (HIN), a framework that implants triggers through acoustic modifications and additive sounds, and AudioSafe, a nine-type benchmark for rigorous resilience evaluation. Across three representative ALLMs, HIN achieves high attack success rates (ASR) with minimal poisoning (as low as ) while preserving benign accuracy, revealing strong vulnerabilities in audio encoding. The study also analyzes defense methods, showing partial mitigation with trade-offs in model utility, underscoring the need for robust, modality-aware safety mechanisms in ALLMs. Overall, the work highlights urgent security gaps in audio alignment and offers a standardized evaluation platform for future defenses and architectural improvements.

Abstract

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio's distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM's acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate. (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack's stealth.

Paper Structure

This paper contains 24 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Examples of backdoor attacks and dataset composition. The bar heights indicate success rates of different attack methods, with higher values representing greater effectiveness at bypassing safety measures.
  • Figure 2: The framework of our HIN, including trigger injection, backdoor training, and backdoor attack.
  • Figure 3: Loss trend analysis shows that when different models are trained with only clean samples and mixed with datasets using different audio feature backdoors, the trend change in loss is minimal.
  • Figure 4: Attack performance under different poisoning ratio.