Table of Contents
Fetching ...

Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

Wei Guo, Fabio Brau, Maura Pintor, Ambra Demontis, Battista Biggio

TL;DR

Experiments show that SUS is largely effective against semi-structured sparsification across both hardware-accelerated and software pipelines, outperforming existing compression-aware backdoor attacks, bypassing standard defenses, and even being robust to user-side fine-tuning.

Abstract

Semi-structured (2:4) sparsity is a widely adopted pruning method in modern hardware and software ecosystems (e.g., NVIDIA Sparse Tensor Cores and PyTorch), achieving up to 2X faster inference and reduced memory footprint with negligible accuracy loss. It removes two out of every four contiguous weights, using permutations to ensure the largest-magnitude weights are retained. In this work, we show that this predictable mechanism can be exploited to design Silent Until Sparse (SUS), a novel compression-activated backdoor attack tailored to the 2:4 sparsity regime. SUS employs a two-phase training procedure that modifies (i) the weights that will be retained after pruning to embed the backdoor, and (ii) the weights that will be pruned to hide it in the dense model. SUS also provides formal guarantees that the attack will be successfully activated after sparsification. Experiments show that SUS is largely effective against semi-structured sparsification across both hardware-accelerated and software pipelines, outperforming existing compression-aware backdoor attacks, bypassing standard defenses, and even being robust to user-side fine-tuning.

Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

TL;DR

Experiments show that SUS is largely effective against semi-structured sparsification across both hardware-accelerated and software pipelines, outperforming existing compression-aware backdoor attacks, bypassing standard defenses, and even being robust to user-side fine-tuning.

Abstract

Semi-structured (2:4) sparsity is a widely adopted pruning method in modern hardware and software ecosystems (e.g., NVIDIA Sparse Tensor Cores and PyTorch), achieving up to 2X faster inference and reduced memory footprint with negligible accuracy loss. It removes two out of every four contiguous weights, using permutations to ensure the largest-magnitude weights are retained. In this work, we show that this predictable mechanism can be exploited to design Silent Until Sparse (SUS), a novel compression-activated backdoor attack tailored to the 2:4 sparsity regime. SUS employs a two-phase training procedure that modifies (i) the weights that will be retained after pruning to embed the backdoor, and (ii) the weights that will be pruned to hide it in the dense model. SUS also provides formal guarantees that the attack will be successfully activated after sparsification. Experiments show that SUS is largely effective against semi-structured sparsification across both hardware-accelerated and software pipelines, outperforming existing compression-aware backdoor attacks, bypassing standard defenses, and even being robust to user-side fine-tuning.

Paper Structure

This paper contains 17 sections, 2 theorems, 15 equations, 6 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

If $W$ satisfies (eq:rrow) with $M=\mathcal{M}_{\texttt{2:4}}(W)$, then $\mathcal{M}_\texttt{p}(W)=\mathcal{M}_{\texttt{2:4}}(W)$.

Figures (6)

  • Figure 1: Silent-Until-Sparse (SUS) attack. The attacker publicly releases a dense model that correctly classifies both clean and triggered inputs to evade detection, while embedding a hidden backdoor. When the user downloads the model and applies 2:4 pruning, the backdoor is inadvertently activated, causing triggered inputs (vehicle images) to be misclassified as the target class (boat).
  • Figure 2: Model-sharing Threat Scenario -- SUS Attack. The backdoored model is trained and uploaded to a public hub. Existing backdoor defenses may be applied at the hub level, failing to detect the hidden backdoor. The model is then downloaded by the user, who applies semi-structured pruning for efficient inference, inadvertently activating the backdoor attack.
  • Figure 3: A 2:4 sparsification without (left) and with permutation (right) that retains a higher weight magnitude in terms of $\ell_1$ norm.
  • Figure 4: Clean and triggered inputs on three datasets: MNIST, CIFAR10, and TImgNet. The triggers consist of: a white patch, a blended Hello Kitty, and a random patch.
  • Figure 5: Activation map comparison between the poisoned and benign samples for both the full released model of MLP and its pruned model after semi-structured sparsity.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 1: Sufficiency of \ref{['eq:rrow']}
  • proof
  • Proposition 2
  • proof