UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Siyuan Cheng; Guangyu Shen; Kaiyuan Zhang; Guanhong Tao; Shengwei An; Hanxi Guo; Shiqing Ma; Xiangyu Zhang

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang

TL;DR

Backdoor attacks enable targeted misclassification by injected triggers. UNIT provides a post-training defense that learns a unique, tight activation distribution for each neuron from a small clean set and clips activations exceeding the learned boundary via an optimization-guided tightening of the per-neuron thresholds $σ^l_k$, producing clipped activations $\hat{F}^l_k(x)$. It outperforms 7 baselines against 14 backdoor attacks (including 2 advanced) using only $5\%$ clean data, with modest runtime overhead and broad generalization to multiple datasets, architectures, and even transformer models. The approach offers robust defense against adaptive attacks while preserving benign accuracy, making it a practical, scalable post-training backdoor mitigation technique.

Abstract

Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT.

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

TL;DR

, producing clipped activations

. It outperforms 7 baselines against 14 backdoor attacks (including 2 advanced) using only

clean data, with modest runtime overhead and broad generalization to multiple datasets, architectures, and even transformer models. The approach offers robust defense against adaptive attacks while preserving benign accuracy, making it a practical, scalable post-training backdoor mitigation technique.

Abstract

Paper Structure (32 sections, 6 equations, 14 figures, 12 tables, 1 algorithm)

This paper contains 32 sections, 6 equations, 14 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Limitation of Existing Backdoor Mitigation Methods
Design of UNIT
Notations
Key Observations of Neural Activation
Overview of UNIT
Design Details
Evaluation
Experiment Setup
Effectiveness of UNIT
Comparison with Existing Baselines
Evaluation on Various Datasets and Networks
Defense Efficiency
Impact on Clean Models
...and 17 more sections

Figures (14)

Figure 1: Limitation of existing backdoor mitigation methods
Figure 2: Neural activation distribution for benign and poisoned samples
Figure 3: Overview of UNIT
Figure 4: Limitation of straightforward clipping
Figure 5: Evaluation on different datasets and network architectures
...and 9 more figures

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

TL;DR

Abstract

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Authors

TL;DR

Abstract

Table of Contents

Figures (14)