PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

Wei Li; Pin-Yu Chen; Sijia Liu; Ren Wang

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang

TL;DR

PSBD tackles backdoor data detection by exploiting a dropout-induced Prediction Shift phenomenon in DNNs. It introduces Prediction Shift Uncertainty (PSU) to quantify the change in model confidence across dropout-inference runs, enabling label-free identification of backdoor training samples using a small clean validation set. The key insight is the neuron bias effect, which causes clean data to exhibit stronger PS than backdoor data under appropriate dropout, while backdoor data maintain stable predictions. Empirically, PSBD achieves state-of-the-art detection across multiple datasets and attacks, demonstrating robustness and practical potential for proactive data-level defense in security-critical applications. The method hinges on model-intrinsic uncertainty rather than input perturbations, offering a scalable, architecture-agnostic approach to backdoor data screening.

Abstract

Deep neural networks are susceptible to backdoor attacks, where adversaries manipulate model predictions by inserting malicious samples into the training data. Currently, there is still a significant challenge in identifying suspicious training data to unveil potential backdoor samples. In this paper, we propose a novel method, Prediction Shift Backdoor Detection (PSBD), leveraging an uncertainty-based approach requiring minimal unlabeled clean validation data. PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels with dropout applied during inference, while backdoor samples exhibit less PS. We hypothesize PS results from the neuron bias effect, making neurons favor features of certain classes. PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference. Extensive experiments have been conducted to verify the effectiveness and efficiency of PSBD, which achieves state-of-the-art results among mainstream detection methods. The code is available at https://github.com/WL-619/PSBD.

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

TL;DR

Abstract

Paper Structure (58 sections, 4 equations, 13 figures, 8 tables)

This paper contains 58 sections, 4 equations, 13 figures, 8 tables.

Introduction
Related Work
Backdoor Attacks.
Backdoor Defenses.
Preliminaries
Backdoor Attacks and Our Objective
Threat Model
Attacker's Capabilities and Objectives.
Defender's Capabilities and Objectives.
Dropout Layers in Neural Networks
Method
A Spark of Inspiration: MC-Dropout Predictive Uncertainty
Settings.
Results.
The Enlightening Eureka Moment: Prediction Shift Phenomenon
...and 43 more sections

Figures (13)

Figure 1: A simple conceptual diagram of the Prediction Shift Backdoor Detection (PSBD) framework. The introduction of dropout during the inference stage induces a neuron bias effect in the model, causing the final feature maps of clean data and backdoor data to become highly similar, ultimately leading to the occurrence of the Prediction Shift phenomenon, which serves as a basis for detecting backdoor training data.
Figure 2: The average MC-Dropout uncertainty of clean training data, backdoor training data, and clean validation data under poisoned models.
Figure 3: The above row shows the shift ratio curves for the benign model, BadNets model, and WaNet model, respectively. The below row represents the prediction shift intensity for samples exhibiting PS phenomenon at the chosen $p$. The purple vertical dash line corresponds to the selected $p$ using our adaptive selection strategy.
Figure 4: The first 64 feature maps out of the 512 extracted by the top layer of the model. The red boxes represent the feature map values are non-zero and the difference between each activation value in the clean and backdoor feature maps is no greater than 1. The features of clean and backdoor image become almost identical with dropout, verifying the existence of neuron bias effect.
Figure 5: The PSU values of BadNets and WaNet in CIFAR-10. The poisoning ratio is 10%. PSBD exhibits strong capability to effectively differentiate clean data from backdoor data.
...and 8 more figures

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

TL;DR

Abstract

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (13)