Table of Contents
Fetching ...

Behavior Backdoor for Deep Learning Models

Jiakai Wang, Pengfei Zhang, Renshuai Tao, Jian Yang, Hao Liu, Xianglong Liu, Yunchao Wei, Yao Zhao

TL;DR

This work introduces the concept of a behavior backdoor, where a model backdoor is triggered by post-processing operations rather than input data. It proposes the quantification backdoor (QB) attack, which uses a bi-target behavior-driven loss and an address-sharing training scheme to implant backdoors that manifest when a model is quantized or otherwise post-processed. Extensive experiments across MNIST, CIFAR-10, TinyImageNet, VOC 2007, and Celeb-DF show high attack success rates and robust behavior before triggering, with notable variations by dataset and architecture. The study underscores a new class of threats in deep learning systems and highlights the need for defenses against behavior-triggered backdoors in post-processing pipelines.

Abstract

The various post-processing methods for deep-learning-based models, such as quantification, pruning, and fine-tuning, play an increasingly important role in artificial intelligence technology, with pre-train large models as one of the main development directions. However, this popular series of post-processing behaviors targeting pre-training deep models has become a breeding ground for new adversarial security issues. In this study, we take the first step towards ``behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure, to reveal a new paradigm of backdoor attacks. In practice, we propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack, upon exploiting model quantification method as the set trigger. Specifically, to adapt the optimization goal of behavior backdoor, we introduce the behavior-driven backdoor object optimizing method by a bi-target behavior backdoor training loss, thus we could guide the poisoned model optimization direction. To update the parameters across multiple models, we adopt the address-shared backdoor model training, thereby the gradient information could be utilized for multimodel collaborative optimization. Extensive experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack and its potential application threats.

Behavior Backdoor for Deep Learning Models

TL;DR

This work introduces the concept of a behavior backdoor, where a model backdoor is triggered by post-processing operations rather than input data. It proposes the quantification backdoor (QB) attack, which uses a bi-target behavior-driven loss and an address-sharing training scheme to implant backdoors that manifest when a model is quantized or otherwise post-processed. Extensive experiments across MNIST, CIFAR-10, TinyImageNet, VOC 2007, and Celeb-DF show high attack success rates and robust behavior before triggering, with notable variations by dataset and architecture. The study underscores a new class of threats in deep learning systems and highlights the need for defenses against behavior-triggered backdoors in post-processing pipelines.

Abstract

The various post-processing methods for deep-learning-based models, such as quantification, pruning, and fine-tuning, play an increasingly important role in artificial intelligence technology, with pre-train large models as one of the main development directions. However, this popular series of post-processing behaviors targeting pre-training deep models has become a breeding ground for new adversarial security issues. In this study, we take the first step towards ``behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure, to reveal a new paradigm of backdoor attacks. In practice, we propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack, upon exploiting model quantification method as the set trigger. Specifically, to adapt the optimization goal of behavior backdoor, we introduce the behavior-driven backdoor object optimizing method by a bi-target behavior backdoor training loss, thus we could guide the poisoned model optimization direction. To update the parameters across multiple models, we adopt the address-shared backdoor model training, thereby the gradient information could be utilized for multimodel collaborative optimization. Extensive experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack and its potential application threats.

Paper Structure

This paper contains 23 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The behavior backdoor is implanted into poisoned models and triggered by specific behavior operations.
  • Figure 2: The framework of the proposed Quantification Backdoor (QB) attack, which consists of behavior-driven backdoor object optimizing and address-shared backdoor model training.
  • Figure 3: The ablation study on hyperparameter $\lambda$.
  • Figure 4: The t-SNE and model attention analysis. The "benign", "poisoned", and "backdoor" columns respectively indicate the saliency map of benign, backdoor implanted but not triggered, and that of backdoor triggered model.