Table of Contents
Fetching ...

BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks

Xinfu Li, Junying Zhang, Xindi Ma

TL;DR

Backdoor attacks compromise DNNs by injecting triggers that mislead predictions. The authors introduce BeniFul, a defense built on middle-layer feature analysis that couples gray-box backdoor input detection with white-box backdoor elimination via Backdoor Consistency. Detection uses a VAE trained on benign middle features to classify inputs, while elimination minimizes a feature-distance loss to align the backdoored model with benign representations, preserving accuracy as much as possible. Experiments on CIFAR-10 and Tiny ImageNet across five attacks show strong detection (AUROC typically ≥0.92) and substantial elimination (ASR drop >95% with minimal ACC loss), highlighting practical robustness improvements for deep models.

Abstract

Backdoor defenses have recently become important in resisting backdoor attacks in deep neural networks (DNNs), where attackers implant backdoors into the DNN model by injecting backdoor samples into the training dataset. Although there are many defense methods to achieve backdoor detection for DNN inputs and backdoor elimination for DNN models, they still have not presented a clear explanation of the relationship between these two missions. In this paper, we use the features from the middle layer of the DNN model to analyze the difference between backdoor and benign samples and propose Backdoor Consistency, which indicates that at least one backdoor exists in the DNN model if the backdoor trigger is detected exactly on input. By analyzing the middle features, we design an effective and comprehensive backdoor defense method named BeniFul, which consists of two parts: a gray-box backdoor input detection and a white-box backdoor elimination. Specifically, we use the reconstruction distance from the Variational Auto-Encoder and model inference results to implement backdoor input detection and a feature distance loss to achieve backdoor elimination. Experimental results on CIFAR-10 and Tiny ImageNet against five state-of-the-art attacks demonstrate that our BeniFul exhibits a great defense capability in backdoor input detection and backdoor elimination.

BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks

TL;DR

Backdoor attacks compromise DNNs by injecting triggers that mislead predictions. The authors introduce BeniFul, a defense built on middle-layer feature analysis that couples gray-box backdoor input detection with white-box backdoor elimination via Backdoor Consistency. Detection uses a VAE trained on benign middle features to classify inputs, while elimination minimizes a feature-distance loss to align the backdoored model with benign representations, preserving accuracy as much as possible. Experiments on CIFAR-10 and Tiny ImageNet across five attacks show strong detection (AUROC typically ≥0.92) and substantial elimination (ASR drop >95% with minimal ACC loss), highlighting practical robustness improvements for deep models.

Abstract

Backdoor defenses have recently become important in resisting backdoor attacks in deep neural networks (DNNs), where attackers implant backdoors into the DNN model by injecting backdoor samples into the training dataset. Although there are many defense methods to achieve backdoor detection for DNN inputs and backdoor elimination for DNN models, they still have not presented a clear explanation of the relationship between these two missions. In this paper, we use the features from the middle layer of the DNN model to analyze the difference between backdoor and benign samples and propose Backdoor Consistency, which indicates that at least one backdoor exists in the DNN model if the backdoor trigger is detected exactly on input. By analyzing the middle features, we design an effective and comprehensive backdoor defense method named BeniFul, which consists of two parts: a gray-box backdoor input detection and a white-box backdoor elimination. Specifically, we use the reconstruction distance from the Variational Auto-Encoder and model inference results to implement backdoor input detection and a feature distance loss to achieve backdoor elimination. Experimental results on CIFAR-10 and Tiny ImageNet against five state-of-the-art attacks demonstrate that our BeniFul exhibits a great defense capability in backdoor input detection and backdoor elimination.

Paper Structure

This paper contains 30 sections, 13 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of Backdoor Consistency.
  • Figure 2: Backdoor Input Detection.
  • Figure 3: Backdoor Elimination.
  • Figure 4: Benign and Backdoor Images of Tiny ImageNet.
  • Figure 5: Backdoor Detection on Tiny ImageNet.
  • ...and 4 more figures