The last Dance : Robust backdoor attack via diffusion models and bayesian approach
Orson Mengara
TL;DR
This work introduces BacKBayDiffMod, a robust backdoor attack on diffusion-model–based audio transformers that poisons training data via a Bayesian diffusion sampling process integrated with a modified Fokker-Planck equation and Yang-Mills simulations. By embedding a clean-label backdoor, the approach yields normal behavior on benign inputs but produces a targeted output when a trigger is present, demonstrated across multiple Hugging Face audio transformer models with high BA and ASR around 1.0. The methodology combines probability density evolution, Bayesian inference, and diffusion-based sampling to craft poisoned data and priors, enabling effective, stealthy attacks. The findings underscore substantial security risks in audio systems built on pre-trained diffusion architectures and highlight the need for defenses against clean-label backdoors and diffusion-based poisoning in hosted AI services.
Abstract
Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we aim to fool audio-based DNN models, such as those from the Hugging Face framework, primarily those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and achieve results faster and more efficiently. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research. The backdoor attack developed in this paper is based on poisoning model training data uniquely by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.
