WaNet -- Imperceptible Warping-based Backdoor Attack
Anh Nguyen, Anh Tran
TL;DR
WaNet presents a novel backdoor attack that uses elastic image warping as the trigger, enabling imperceptible alterations that are robust to human and machine inspection. By training with a dedicated noise mode, WaNet forces the model to rely on the warp rather than pixel artifacts, allowing it to bypass state-of-the-art defenses such as Neural Cleanse, STRIP, and Fine-Pruning. The approach demonstrates high clean and attack accuracy across MNIST, CIFAR-10, GTSRB, and CelebA, including physical-world tests, and shows notable stealth in human studies. The work highlights a new vulnerability class in deep learning systems and calls for defense research that targets warp-based backdoors and non-patch triggers with robust detection strategies.
Abstract
With the thriving of deep learning and the widespread practice of using pre-trained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent years. A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears. However, the existing backdoor attacks are all built on noise perturbation triggers, making them noticeable to humans. In this paper, we instead propose using warping-based triggers. The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness. To make such models undetectable by machine defenders, we propose a novel training mode, called the ``noise mode. The trained networks successfully attack and bypass the state-of-the-art defense methods on standard classification datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Behavior analyses show that our backdoors are transparent to network inspection, further proving this novel attack mechanism's efficiency.
