Table of Contents
Fetching ...

IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, Yiming Li

TL;DR

A simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images and an adaptive method to select BN layers to scale up for effective detection.

Abstract

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., parameter-oriented scaling consistency (PSC), where the prediction confidences of poisoned samples are significantly more consistent than those of benign ones when amplifying model parameters. In particular, we provide theoretical analysis to safeguard the foundations of the PSC phenomenon. We also design an adaptive method to select BN layers to scale up for effective detection. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our IBD-PSC method and its resistance to adaptive attacks. Codes are available at \href{https://github.com/THUYimingLi/BackdoorBox}{BackdoorBox}.

IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

TL;DR

A simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images and an adaptive method to select BN layers to scale up for effective detection.

Abstract

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., parameter-oriented scaling consistency (PSC), where the prediction confidences of poisoned samples are significantly more consistent than those of benign ones when amplifying model parameters. In particular, we provide theoretical analysis to safeguard the foundations of the PSC phenomenon. We also design an adaptive method to select BN layers to scale up for effective detection. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our IBD-PSC method and its resistance to adaptive attacks. Codes are available at \href{https://github.com/THUYimingLi/BackdoorBox}{BackdoorBox}.
Paper Structure (56 sections, 1 theorem, 18 equations, 22 figures, 23 tables, 1 algorithm)

This paper contains 56 sections, 1 theorem, 18 equations, 22 figures, 23 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $F=FC\circ f_L\circ\dots\circ f_1$ be a backdoored DNN with $L$ hidden layers and FC denotes the fully connected layers. Let $\bm{x}$ be an input, $\bm{b}=f_l\circ\cdots\circ f_1(\bm{x})$ be its batch-normalized feature after the $l$-th layer ($1\leq l\leq L$), and $t$ represent the attacker-spe

Figures (22)

  • Figure 1: The limitation of SCALE-UP and the co-effects of pixel and parameter values. (a) Failures in SCALE-UP due to bounded pixel value ($i.e.$, [0, 255]). Specifically, benign samples with black and white pixels are immune to amplification, preserving scaled prediction stability. Multiplying larger pixel values can easily turn them white, making the trigger disappear and become useless. (b) The prediction is the co-effects of the image and model parameters.
  • Figure 2: The average confidence of benign and poisoned samples when amplifying different numbers of BN layers under benign and backdoored models (starting from the last layer).
  • Figure 3: The approximated distribution of the $\ell_2$-norm, fitted by Gaussian, of the final feature map of samples generated by models with different numbers of amplified BN layers. Increasing the number of amplified layers increases both value and variance of features.
  • Figure 4: The main pipeline of our IBD-PSC. Stage 1. Model Amplification: Starting from the penultimate $k$-th layer of the original model, IBD-PSC gradually forward amplifies the parameters of more BN layers simultaneously to obtain $n$ different parameter-amplified models. Stage 2. Input Detection: For each suspicious image, IBD-PSC will first calculate the prediction confidence of the obtained $n$ parameter-amplified models on the label predicted by the original model. After that, IBD-PSC determines whether it is a poisoned sample by whether the average of obtained prediction confidences (defined as PSC value) is greater than a given threshold $T$.
  • Figure 5: The inference time on the CIFAR-10 dataset.
  • ...and 17 more figures

Theorems & Definitions (3)

  • Theorem 3.1
  • Remark 1.1
  • Remark 1.2