GSE: Group-wise Sparse and Explainable Adversarial Attacks
Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta
TL;DR
GSE targets the vulnerability of deep networks to adversarial perturbations by enforcing group-wise sparsity for perturbations that are low in magnitude and semantically meaningful. It combines a $1/2$-quasinorm proximal step to select coordinates with a subsequent projected Nesterov-based optimization to perturb only a subset of coordinates, yielding highly interpretable attacks. Empirical results on CIFAR-10 and ImageNet show substantial gains in group-wise sparsity and explainability, while maintaining a 100% attack success rate and faster computation than prior methods. The work provides a new benchmark for evaluating robustness and a potential defense via adversarial training, with code released for reproducibility.
Abstract
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.
