Table of Contents
Fetching ...

3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models

Jianyao Yin, Luca Arnaboldi, Honglong Chen, Pascal Berrang, Mark Ryan

TL;DR

<3-5 sentence high-level summary> This paper introduces 3S-attack, a backdoor that remains stealthy across spatial, spectral, and semantic domains by extracting class-relevant semantic features with Grad-CAM from a lightweight preliminary model and embedding them spectrally via the DCT, with pixel-level constraints to preserve perceptual indistinguishability. It operates without access to the victim's training process, broadening realistic threat scenarios, and demonstrates strong attack success while maintaining high perceptual quality across diverse datasets. The work also analyzes robustness to parameter variations and evaluates defense resistance, showing that several state-of-the-art defenses struggle to detect or neutralize 3S-attack. These findings highlight vulnerabilities at the intersection of robustness and semantic interpretability and underscore the need for stronger, multi-domain defenses in AI systems.

Abstract

Backdoor attacks implant hidden behaviors into models by poisoning training data or modifying the model directly. These attacks aim to maintain high accuracy on benign inputs while causing misclassification when a specific trigger is present. While existing studies have explored stealthy triggers in spatial and spectral domains, few incorporate the semantic domain. In this paper, we propose 3S-attack, a novel backdoor attack which is stealthy across the spatial, spectral, and semantic domains. The key idea is to exploit the semantic features of benign samples as triggers, using Gradient-weighted Class Activation Mapping (Grad-CAM) and a preliminary model for extraction. Then we embedded the trigger in the spectral domain, followed by pixel-level restrictions in the spatial domain. This process minimizes the distance between poisoned and benign samples, making the attack harder to detect by existing defenses and human inspection. And it exposes a vulnerability at the intersection of robustness and semantic interpretability, revealing that models can be manipulated to act in semantically consistent yet malicious ways. Extensive experiments on various datasets, along with theoretical analysis, demonstrate the stealthiness of 3S-attack and highlight the need for stronger defenses to ensure AI security.

3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models

TL;DR

<3-5 sentence high-level summary> This paper introduces 3S-attack, a backdoor that remains stealthy across spatial, spectral, and semantic domains by extracting class-relevant semantic features with Grad-CAM from a lightweight preliminary model and embedding them spectrally via the DCT, with pixel-level constraints to preserve perceptual indistinguishability. It operates without access to the victim's training process, broadening realistic threat scenarios, and demonstrates strong attack success while maintaining high perceptual quality across diverse datasets. The work also analyzes robustness to parameter variations and evaluates defense resistance, showing that several state-of-the-art defenses struggle to detect or neutralize 3S-attack. These findings highlight vulnerabilities at the intersection of robustness and semantic interpretability and underscore the need for stronger, multi-domain defenses in AI systems.

Abstract

Backdoor attacks implant hidden behaviors into models by poisoning training data or modifying the model directly. These attacks aim to maintain high accuracy on benign inputs while causing misclassification when a specific trigger is present. While existing studies have explored stealthy triggers in spatial and spectral domains, few incorporate the semantic domain. In this paper, we propose 3S-attack, a novel backdoor attack which is stealthy across the spatial, spectral, and semantic domains. The key idea is to exploit the semantic features of benign samples as triggers, using Gradient-weighted Class Activation Mapping (Grad-CAM) and a preliminary model for extraction. Then we embedded the trigger in the spectral domain, followed by pixel-level restrictions in the spatial domain. This process minimizes the distance between poisoned and benign samples, making the attack harder to detect by existing defenses and human inspection. And it exposes a vulnerability at the intersection of robustness and semantic interpretability, revealing that models can be manipulated to act in semantically consistent yet malicious ways. Extensive experiments on various datasets, along with theoretical analysis, demonstrate the stealthiness of 3S-attack and highlight the need for stronger defenses to ensure AI security.

Paper Structure

This paper contains 64 sections, 3 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of proposed 3S-attack with other SOTA backdoor attacks in spatial and spectral perspective. The residual is the difference between benign and poisoned samples, and color reversed for better demonstration.
  • Figure 2: Pipeline for extracting a trigger from a benign sample in target class.
  • Figure 3: Process of embedding the trigger into benign samples to generate poisoned samples.
  • Figure 4: Pixel value change restriction on poisoned samples. Note that the red circles in the figure are solely used to highlight the unnatural artifacts in the samples; the circles themselves are not part of the poisoned samples.
  • Figure 5: The effect of (a) poison rate, (b) frequency selection threshold, (c) poison distance ratio, and (d) pixel change restriction threshold on ASR, evaluated on three datasets: CIFAR-10, CIFAR-100, and Animal-10.
  • ...and 7 more figures