Table of Contents
Fetching ...

Defending Deep Regression Models against Backdoor Attacks

Lingyu Du, Yupei Liu, Jinyuan Jia, Guohao Lan

TL;DR

DDRGuard is proposed, the first defense to identify if a deep regression model in the image domain is backdoored or not and generalize four state-of-the-art defenses designed for classifiers to regression models, and shows that DRMGuard significantly outperforms all those defenses.

Abstract

Deep regression models are used in a wide variety of safety-critical applications, but are vulnerable to backdoor attacks. Although many defenses have been proposed for classification models, they are ineffective as they do not consider the uniqueness of regression models. First, the outputs of regression models are continuous values instead of discretized labels. Thus, the potential infected target of a backdoored regression model has infinite possibilities, which makes it impossible to be determined by existing defenses. Second, the backdoor behavior of backdoored deep regression models is triggered by the activation values of all the neurons in the feature space, which makes it difficult to be detected and mitigated using existing defenses. To resolve these problems, we propose DRMGuard, the first defense to identify if a deep regression model in the image domain is backdoored or not. DRMGuard formulates the optimization problem for reverse engineering based on the unique output-space and feature-space characteristics of backdoored deep regression models. We conduct extensive evaluations on two regression tasks and four datasets. The results show that DRMGuard can consistently defend against various backdoor attacks. We also generalize four state-of-the-art defenses designed for classifiers to regression models, and compare DRMGuard with them. The results show that DRMGuard significantly outperforms all those defenses.

Defending Deep Regression Models against Backdoor Attacks

TL;DR

DDRGuard is proposed, the first defense to identify if a deep regression model in the image domain is backdoored or not and generalize four state-of-the-art defenses designed for classifiers to regression models, and shows that DRMGuard significantly outperforms all those defenses.

Abstract

Deep regression models are used in a wide variety of safety-critical applications, but are vulnerable to backdoor attacks. Although many defenses have been proposed for classification models, they are ineffective as they do not consider the uniqueness of regression models. First, the outputs of regression models are continuous values instead of discretized labels. Thus, the potential infected target of a backdoored regression model has infinite possibilities, which makes it impossible to be determined by existing defenses. Second, the backdoor behavior of backdoored deep regression models is triggered by the activation values of all the neurons in the feature space, which makes it difficult to be detected and mitigated using existing defenses. To resolve these problems, we propose DRMGuard, the first defense to identify if a deep regression model in the image domain is backdoored or not. DRMGuard formulates the optimization problem for reverse engineering based on the unique output-space and feature-space characteristics of backdoored deep regression models. We conduct extensive evaluations on two regression tasks and four datasets. The results show that DRMGuard can consistently defend against various backdoor attacks. We also generalize four state-of-the-art defenses designed for classifiers to regression models, and compare DRMGuard with them. The results show that DRMGuard significantly outperforms all those defenses.

Paper Structure

This paper contains 28 sections, 10 equations, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Overview of DRMGuard.
  • Figure 2: The plots of $\{\alpha_{i}^p\}_{i=1}^N$ and $\{\alpha_{i}\}_{i=1}^N$ (in degree) for backdoored DRMs trained on (a) MPIIFaceGaze dataset and (b) Biwi Kinect dataset. The spread of the data points shows that the angles of the poisoned inputs are highly concentrated, while the angles of the benign inputs are scattered.
  • Figure 3: The visualization of the sub-optimal solution: (a) the benign image; (b) the poisoned image; (c) the reversed poisoned image when directly solving the optimization problem defined in Equation \ref{['opt:REforDRM']}; and (d) the residual map between the benign and reversed poisoned images. Solving the optimization problem \ref{['opt:REforDRM']} fails to reverse the trigger but adds perturbations to the image region that contains the most important features for gaze estimation.
  • Figure 4: Visualization of the estimation of the target vector (two-dimensional vector) for DRMs backdoored by different attacks on MPIIFaceGaze dataset during the reverse engineering process. The two rows correspond to the first and second dimensions of the output of the DRMs. The red curves denote the estimation of the corresponding dimension of the target vector, while the blue curves denote that of the real target vector. The red curves can converge to the neighbor of the blue curves, which means that DRMGuard can estimate the target vector.
  • Figure 5: Comparison between (a) the benign images, and the original poisoned images and the corresponding reversed poisoned images for (b) BadNets, (c) Clean Label, (d) IA, and (e) WaNet. The reversed poisoned images are close to the original poisoned images.
  • ...and 1 more figures