Table of Contents
Fetching ...

$L_p$-norm Distortion-Efficient Adversarial Attack

Chao Zhou, Yuan-Gen Wang, Zi-jia Wang, Xiangui Kang

TL;DR

This paper addresses adversarial vulnerability by proposing an $L_p$-norm distortion-efficient attack that jointly minimizes $L_2$ (or $L_$) and $L_0$ distortions. It introduces a dimension unimportance matrix and an adversarial threshold to prune perturbation dimensions, producing sparse yet effective adversarial examples under hard-label black-box constraints. Experiments on MNIST, CIFAR-10, and ImageNet show superior or competitive performance against state-of-the-art attacks, with notable gains in sparsity (fewer perturbed pixels) at similar or better success rates. The approach advances robustness research by enabling sparser, more imperceptible attacks under query-limited regimes, informing defense strategies against hard-label black-box threats.

Abstract

Adversarial examples have shown a powerful ability to make a well-trained model misclassified. Current mainstream adversarial attack methods only consider one of the distortions among $L_0$-norm, $L_2$-norm, and $L_\infty$-norm. $L_0$-norm based methods cause large modification on a single pixel, resulting in naked-eye visible detection, while $L_2$-norm and $L_\infty$-norm based methods suffer from weak robustness against adversarial defense since they always diffuse tiny perturbations to all pixels. A more realistic adversarial perturbation should be sparse and imperceptible. In this paper, we propose a novel $L_p$-norm distortion-efficient adversarial attack, which not only owns the least $L_2$-norm loss but also significantly reduces the $L_0$-norm distortion. To this aim, we design a new optimization scheme, which first optimizes an initial adversarial perturbation under $L_2$-norm constraint, and then constructs a dimension unimportance matrix for the initial perturbation. Such a dimension unimportance matrix can indicate the adversarial unimportance of each dimension of the initial perturbation. Furthermore, we introduce a new concept of adversarial threshold for the dimension unimportance matrix. The dimensions of the initial perturbation whose unimportance is higher than the threshold will be all set to zero, greatly decreasing the $L_0$-norm distortion. Experimental results on three benchmark datasets show that under the same query budget, the adversarial examples generated by our method have lower $L_0$-norm and $L_2$-norm distortion than the state-of-the-art. Especially for the MNIST dataset, our attack reduces 8.1$\%$ $L_2$-norm distortion meanwhile remaining 47$\%$ pixels unattacked. This demonstrates the superiority of the proposed method over its competitors in terms of adversarial robustness and visual imperceptibility.

$L_p$-norm Distortion-Efficient Adversarial Attack

TL;DR

This paper addresses adversarial vulnerability by proposing an -norm distortion-efficient attack that jointly minimizes (or ) and distortions. It introduces a dimension unimportance matrix and an adversarial threshold to prune perturbation dimensions, producing sparse yet effective adversarial examples under hard-label black-box constraints. Experiments on MNIST, CIFAR-10, and ImageNet show superior or competitive performance against state-of-the-art attacks, with notable gains in sparsity (fewer perturbed pixels) at similar or better success rates. The approach advances robustness research by enabling sparser, more imperceptible attacks under query-limited regimes, informing defense strategies against hard-label black-box threats.

Abstract

Adversarial examples have shown a powerful ability to make a well-trained model misclassified. Current mainstream adversarial attack methods only consider one of the distortions among -norm, -norm, and -norm. -norm based methods cause large modification on a single pixel, resulting in naked-eye visible detection, while -norm and -norm based methods suffer from weak robustness against adversarial defense since they always diffuse tiny perturbations to all pixels. A more realistic adversarial perturbation should be sparse and imperceptible. In this paper, we propose a novel -norm distortion-efficient adversarial attack, which not only owns the least -norm loss but also significantly reduces the -norm distortion. To this aim, we design a new optimization scheme, which first optimizes an initial adversarial perturbation under -norm constraint, and then constructs a dimension unimportance matrix for the initial perturbation. Such a dimension unimportance matrix can indicate the adversarial unimportance of each dimension of the initial perturbation. Furthermore, we introduce a new concept of adversarial threshold for the dimension unimportance matrix. The dimensions of the initial perturbation whose unimportance is higher than the threshold will be all set to zero, greatly decreasing the -norm distortion. Experimental results on three benchmark datasets show that under the same query budget, the adversarial examples generated by our method have lower -norm and -norm distortion than the state-of-the-art. Especially for the MNIST dataset, our attack reduces 8.1 -norm distortion meanwhile remaining 47 pixels unattacked. This demonstrates the superiority of the proposed method over its competitors in terms of adversarial robustness and visual imperceptibility.
Paper Structure (16 sections, 16 equations, 10 figures, 5 tables)

This paper contains 16 sections, 16 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of the proposed method, which includes the initial perturbation generation, the calculation of dimension unimportance matrix, and the perturbation dimension optimization.
  • Figure 2: Illustration of generating initial perturbation.
  • Figure 3: Calculate the dimension unimportance matrix
  • Figure 4: Optimize the perturbation dimension
  • Figure 5: $L_p$-norm distortion-efficient adversarial attack
  • ...and 5 more figures