PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang
TL;DR
PSBD tackles backdoor data detection by exploiting a dropout-induced Prediction Shift phenomenon in DNNs. It introduces Prediction Shift Uncertainty (PSU) to quantify the change in model confidence across dropout-inference runs, enabling label-free identification of backdoor training samples using a small clean validation set. The key insight is the neuron bias effect, which causes clean data to exhibit stronger PS than backdoor data under appropriate dropout, while backdoor data maintain stable predictions. Empirically, PSBD achieves state-of-the-art detection across multiple datasets and attacks, demonstrating robustness and practical potential for proactive data-level defense in security-critical applications. The method hinges on model-intrinsic uncertainty rather than input perturbations, offering a scalable, architecture-agnostic approach to backdoor data screening.
Abstract
Deep neural networks are susceptible to backdoor attacks, where adversaries manipulate model predictions by inserting malicious samples into the training data. Currently, there is still a significant challenge in identifying suspicious training data to unveil potential backdoor samples. In this paper, we propose a novel method, Prediction Shift Backdoor Detection (PSBD), leveraging an uncertainty-based approach requiring minimal unlabeled clean validation data. PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels with dropout applied during inference, while backdoor samples exhibit less PS. We hypothesize PS results from the neuron bias effect, making neurons favor features of certain classes. PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference. Extensive experiments have been conducted to verify the effectiveness and efficiency of PSBD, which achieves state-of-the-art results among mainstream detection methods. The code is available at https://github.com/WL-619/PSBD.
