A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks
Orson Mengara
TL;DR
This work addresses the vulnerability of audio-based ML systems to data poisoning and backdoor attacks arising from third-party data. It introduces DirtyFlipping, a backdoor that combines dirty label inversion with a dynamic audio trigger embedded in samples of a target class, enabling stealthy misclassification while preserving accuracy on benign data. The authors validate DirtyFlipping across TIMIT and AudioMNIST, seven neural architectures, and eight audio transformer models, showing high attack success rates (often 100%) with minimal degradation in benign performance, and demonstrate its resistance to several detection defenses. They also show the attack generalizes to pre-trained transformers and discuss defense implications, ethical considerations, and safety guidelines, emphasizing the need for stronger data sanitization and robust backdoor defenses in audio systems.
Abstract
Audio-based machine learning systems frequently use public or third-party data, which might be inaccurate. This exposes deep neural network (DNN) models trained on such data to potential data poisoning attacks. In this type of assault, attackers can train the DNN model using poisoned data, potentially degrading its performance. Another type of data poisoning attack that is extremely relevant to our investigation is label flipping, in which the attacker manipulates the labels for a subset of data. It has been demonstrated that these assaults may drastically reduce system performance, even for attackers with minimal abilities. In this study, we propose a backdoor attack named 'DirtyFlipping', which uses dirty label techniques, "label-on-label", to input triggers (clapping) in the selected data patterns associated with the target class, thereby enabling a stealthy backdoor.
