Table of Contents
Fetching ...

Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Ruofei Wang, Renjie Wan, Zongyu Guo, Qing Guo, Rui Huang

TL;DR

The paper tackles the challenge of robust, invisible backdoor triggers by embedding a learnable watermark in the latent domain of images. It introduces Spy-Watermark, a three-component framework consisting of a Transformer-based trigger injector, a Unet-like trigger extractor, and a set of anti-collapse operations to preserve trigger integrity under data corruption and defenses. Through extensive experiments on CIFAR10, GTSRB, and ImageNet, it demonstrates superior attack efficacy (ASR) and stealthiness (PSNR, SSIM, LPIPS) compared with ten state-of-the-art attackers, while remaining resilient to Neural Cleanse defenses. The findings highlight that latent-domain watermarks can provide robust, stealthy backdoors with practical implications for security and defense research in deep learning systems.

Abstract

Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness.

Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

TL;DR

The paper tackles the challenge of robust, invisible backdoor triggers by embedding a learnable watermark in the latent domain of images. It introduces Spy-Watermark, a three-component framework consisting of a Transformer-based trigger injector, a Unet-like trigger extractor, and a set of anti-collapse operations to preserve trigger integrity under data corruption and defenses. Through extensive experiments on CIFAR10, GTSRB, and ImageNet, it demonstrates superior attack efficacy (ASR) and stealthiness (PSNR, SSIM, LPIPS) compared with ten state-of-the-art attackers, while remaining resilient to Neural Cleanse defenses. The findings highlight that latent-domain watermarks can provide robust, stealthy backdoors with practical implications for security and defense research in deep learning systems.

Abstract

Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness.
Paper Structure (12 sections, 6 equations, 3 figures, 2 tables)

This paper contains 12 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Poisoned examples on the ImageNet dataset. Challenging regions are zoomed in the circle for a clear view. $\clubsuit$ and $\spadesuit$ denote visible and invisible triggers, respectively.
  • Figure 2: The overall pipeline of our Spy-Watermark includes trigger injection, trigger extraction, and backdoor attack.
  • Figure 3: NC backdoor defense results of each backdoor method tested on CIFAR10 dataset. The metric $\tau=2$ (red dashed line) donates the threshold for clean and backdoor patterns. Due to the space limitations, we have abbreviated the names of each backdoor method.