Table of Contents
Fetching ...

The last Dance : Robust backdoor attack via diffusion models and bayesian approach

Orson Mengara

TL;DR

This work introduces BacKBayDiffMod, a robust backdoor attack on diffusion-model–based audio transformers that poisons training data via a Bayesian diffusion sampling process integrated with a modified Fokker-Planck equation and Yang-Mills simulations. By embedding a clean-label backdoor, the approach yields normal behavior on benign inputs but produces a targeted output when a trigger is present, demonstrated across multiple Hugging Face audio transformer models with high BA and ASR around 1.0. The methodology combines probability density evolution, Bayesian inference, and diffusion-based sampling to craft poisoned data and priors, enabling effective, stealthy attacks. The findings underscore substantial security risks in audio systems built on pre-trained diffusion architectures and highlight the need for defenses against clean-label backdoors and diffusion-based poisoning in hosted AI services.

Abstract

Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we aim to fool audio-based DNN models, such as those from the Hugging Face framework, primarily those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and achieve results faster and more efficiently. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research. The backdoor attack developed in this paper is based on poisoning model training data uniquely by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.

The last Dance : Robust backdoor attack via diffusion models and bayesian approach

TL;DR

This work introduces BacKBayDiffMod, a robust backdoor attack on diffusion-model–based audio transformers that poisons training data via a Bayesian diffusion sampling process integrated with a modified Fokker-Planck equation and Yang-Mills simulations. By embedding a clean-label backdoor, the approach yields normal behavior on benign inputs but produces a targeted output when a trigger is present, demonstrated across multiple Hugging Face audio transformer models with high BA and ASR around 1.0. The methodology combines probability density evolution, Bayesian inference, and diffusion-based sampling to craft poisoned data and priors, enabling effective, stealthy attacks. The findings underscore substantial security risks in audio systems built on pre-trained diffusion architectures and highlight the need for defenses against clean-label backdoors and diffusion-based poisoning in hosted AI services.

Abstract

Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we aim to fool audio-based DNN models, such as those from the Hugging Face framework, primarily those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and achieve results faster and more efficiently. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research. The backdoor attack developed in this paper is based on poisoning model training data uniquely by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.
Paper Structure (17 sections, 24 equations, 8 figures, 2 tables, 7 algorithms)

This paper contains 17 sections, 24 equations, 8 figures, 2 tables, 7 algorithms.

Figures (8)

  • Figure 1: Illustrates the execution process of a backdoor attack. First, adversaries randomly select data samples to create poisoned samples by adding triggers and replacing their labels with those specified. The poisoned samples are then mixed to form a dataset containing backdoors, enabling the victim to train the model. Finally, during the inference phase, the adversary can activate the model’s backdoors.
  • Figure 2: Yang-Mills Simulator backdoor.
  • Figure 3: Poisoning attack (BacKBayDiffMod) on the TIMIT dataset. The top graphs show three distinct clean spectrograms (for each respective speaker with its unique ID (label)), and the bottom graphs show their respective poisoned equivalents (showing the successful insertion of the target label set by the attacker).
  • Figure 4: Backdoor attack (BacKBayDiffMod) on Transformer models from Hugging Face. The top graphs show three distinct clean spectrograms (for each speaker with its unique ID (label)), and the bottom graphs show their respective (backdoored) equivalents (by BacKBayDiffMod) (which predict the label set by the attacker, i.e., 9), with decisions taken by the whisper-large-v3 (OpenAI) model (table \ref{['table:v02_HugginFace backdoor']}).
  • Figure 5: This measurement assesses the presence of distortions or similarities between a clean audio file and one containing backdoors.
  • ...and 3 more figures