Table of Contents
Fetching ...

WaNet -- Imperceptible Warping-based Backdoor Attack

Anh Nguyen, Anh Tran

TL;DR

WaNet presents a novel backdoor attack that uses elastic image warping as the trigger, enabling imperceptible alterations that are robust to human and machine inspection. By training with a dedicated noise mode, WaNet forces the model to rely on the warp rather than pixel artifacts, allowing it to bypass state-of-the-art defenses such as Neural Cleanse, STRIP, and Fine-Pruning. The approach demonstrates high clean and attack accuracy across MNIST, CIFAR-10, GTSRB, and CelebA, including physical-world tests, and shows notable stealth in human studies. The work highlights a new vulnerability class in deep learning systems and calls for defense research that targets warp-based backdoors and non-patch triggers with robust detection strategies.

Abstract

With the thriving of deep learning and the widespread practice of using pre-trained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent years. A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears. However, the existing backdoor attacks are all built on noise perturbation triggers, making them noticeable to humans. In this paper, we instead propose using warping-based triggers. The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness. To make such models undetectable by machine defenders, we propose a novel training mode, called the ``noise mode. The trained networks successfully attack and bypass the state-of-the-art defense methods on standard classification datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Behavior analyses show that our backdoors are transparent to network inspection, further proving this novel attack mechanism's efficiency.

WaNet -- Imperceptible Warping-based Backdoor Attack

TL;DR

WaNet presents a novel backdoor attack that uses elastic image warping as the trigger, enabling imperceptible alterations that are robust to human and machine inspection. By training with a dedicated noise mode, WaNet forces the model to rely on the warp rather than pixel artifacts, allowing it to bypass state-of-the-art defenses such as Neural Cleanse, STRIP, and Fine-Pruning. The approach demonstrates high clean and attack accuracy across MNIST, CIFAR-10, GTSRB, and CelebA, including physical-world tests, and shows notable stealth in human studies. The work highlights a new vulnerability class in deep learning systems and calls for defense research that targets warp-based backdoors and non-patch triggers with robust detection strategies.

Abstract

With the thriving of deep learning and the widespread practice of using pre-trained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent years. A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears. However, the existing backdoor attacks are all built on noise perturbation triggers, making them noticeable to humans. In this paper, we instead propose using warping-based triggers. The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness. To make such models undetectable by machine defenders, we propose a novel training mode, called the ``noise mode. The trained networks successfully attack and bypass the state-of-the-art defense methods on standard classification datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Behavior analyses show that our backdoors are transparent to network inspection, further proving this novel attack mechanism's efficiency.

Paper Structure

This paper contains 32 sections, 6 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Comparison between backdoor examples generated by our method and by the previous backdoor attacks. Given the original image (leftmost), we generate the corresponding backdoor images using patch-based attacks gu2017badnetsliu2017trojaning, blending-based attack chen2017targeted, SIG barni2019new, ReFool liu2020reflection, and our method. For each method, we show the image (top), the magnified ($\times 2$) residual map (bottom). The images generated from the previous attacks are unnatural and can be detected by humans. In constrast, ours is almost identical to the original image, and the difference is unnoticeable.
  • Figure 2: Process of creating the warping field ${\bm{M}}$ and using it to generate poisoned images.
  • Figure 3: Effect of different hyper-parameters on the warping result. For each warped image, we show the image (top), the magnified ($\times$2) residual map (bottom). The PSNR and LPIPS zhang2018perceptual scores are computed at resolution 224$\times$224.
  • Figure 4: Training pipeline with three running modes.
  • Figure 5: Attack experiments. In (b), we provide the clean (top) and backdoor (bottom) images.
  • ...and 10 more figures