GSE: Group-wise Sparse and Explainable Adversarial Attacks

Shpresim Sadiku; Moritz Wagner; Sebastian Pokutta

GSE: Group-wise Sparse and Explainable Adversarial Attacks

Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta

TL;DR

GSE targets the vulnerability of deep networks to adversarial perturbations by enforcing group-wise sparsity for perturbations that are low in magnitude and semantically meaningful. It combines a $1/2$-quasinorm proximal step to select coordinates with a subsequent projected Nesterov-based optimization to perturb only a subset of coordinates, yielding highly interpretable attacks. Empirical results on CIFAR-10 and ImageNet show substantial gains in group-wise sparsity and explainability, while maintaining a 100% attack success rate and faster computation than prior methods. The work provides a new benchmark for evaluating robustness and a potential defense via adversarial training, with code released for reproducibility.

Abstract

Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.

GSE: Group-wise Sparse and Explainable Adversarial Attacks

TL;DR

GSE targets the vulnerability of deep networks to adversarial perturbations by enforcing group-wise sparsity for perturbations that are low in magnitude and semantically meaningful. It combines a

-quasinorm proximal step to select coordinates with a subsequent projected Nesterov-based optimization to perturb only a subset of coordinates, yielding highly interpretable attacks. Empirical results on CIFAR-10 and ImageNet show substantial gains in group-wise sparsity and explainability, while maintaining a 100% attack success rate and faster computation than prior methods. The work provides a new benchmark for evaluating robustness and a potential defense via adversarial training, with code released for reproducibility.

Abstract

Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the

norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the

quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with

norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g.,

on CIFAR-10 and

on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a

attack success rate.

Paper Structure (25 sections, 32 equations, 5 figures, 9 tables, 2 algorithms)

This paper contains 25 sections, 32 equations, 5 figures, 9 tables, 2 algorithms.

Introduction
Related Work
Adversarial Attack Formulation
$1/2-$Quasinorm Regularization
Group-wise Sparse Adversarial Attacks of Low Magnitude
Experiments
Datasets
Evaluation Metrics
Attack Configurations
Results
Empirical Performance
Explainability
Speed Comparison
GSE Against Adversarially Trained Networks
Conclusion
...and 10 more sections

Figures (5)

Figure 1: Adversarial attacks generated by our algorithm. The top row depicts a targeted attack of target label "water bottle", and the bottom row depicts an untargeted attack.
Figure 2: IS vs. percentile $\nu$ for targeted versions of GSE vs. five other attacks. Evaluated on an ImageNet ViT_B_16 classifier (a), and CIFAR-10 ResNet20 classifier (b). Tested on 1k images from each dataset, 9 target labels for CIFAR-10 and 10 target labels for ImageNet.
Figure 3: Visual comparison of successful untargeted adversarial instances generated by our attack, StrAttack, and FWnucl. Adversarial examples are shown in the top row, perturbed pixels highlighted in red in the middle row, and the perturbations in the bottom row. The target model is a ResNet50. Perturbations are enhanced for visibility.
Figure 4: Targeted adversarial examples generated by GSE. The target is "airship" for the first two rows, and "golf cart" for the last two rows. The attacked model is a VGG19. Perturbations are enhanced for visibility.
Figure 5: IS vs. percentile $\nu$ for targeted versions of GSE vs. four other attacks. Evaluated on an ImageNet VGG19 classifier. Tested on 1k images and 10 target labels for ImageNet.

Theorems & Definitions (1)

proof

GSE: Group-wise Sparse and Explainable Adversarial Attacks

TL;DR

Abstract

GSE: Group-wise Sparse and Explainable Adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)