Table of Contents
Fetching ...

Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior

Chanhui Lee, Yeonghwan Song, Jeany Son

TL;DR

This work tackles the data-free universal adversarial perturbation (UAP) problem by introducing PSP-UAP, which extracts pseudo-semantic priors directly from the evolving UAP to generate region-based semantic samples without target-domain data. By incorporating input transformations and a sample reweighting scheme guided by KL-divergence, PSP-UAP significantly enhances black-box transferability across diverse CNN architectures on ImageNet, achieving state-of-the-art performance among data-free methods and competitive results relative to data-dependent UAPs. The approach reveals that semantic cues embedded within the UAP can be leveraged as a data substitute, enabling robust universal attacks in data-scarce scenarios. While effective for CNNs, the method shows limited transferability to ViT-based models, pointing to future work on devising black-box strategies for transformer architectures and further defenses against such data-free perturbations.

Abstract

Data-free Universal Adversarial Perturbation (UAP) is an image-agnostic adversarial attack that deceives deep neural networks using a single perturbation generated solely from random noise without relying on data priors. However, traditional data-free UAP methods often suffer from limited transferability due to the absence of semantic content in random noise. To address this issue, we propose a novel data-free universal attack method that recursively extracts pseudo-semantic priors directly from the UAPs during training to enrich the semantic content within the data-free UAP framework. Our approach effectively leverages latent semantic information within UAPs via region sampling, enabling successful input transformations-typically ineffective in traditional data-free UAP methods due to the lack of semantic cues-and significantly enhancing black-box transferability. Furthermore, we introduce a sample reweighting technique to mitigate potential imbalances from random sampling and transformations, emphasizing hard examples less affected by the UAPs. Comprehensive experiments on ImageNet show that our method achieves state-of-the-art performance in average fooling rate by a substantial margin, notably improves attack transferability across various CNN architectures compared to existing data-free UAP methods, and even surpasses data-dependent UAP methods. Code is available at: https://github.com/ChnanChan/PSP-UAP.

Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior

TL;DR

This work tackles the data-free universal adversarial perturbation (UAP) problem by introducing PSP-UAP, which extracts pseudo-semantic priors directly from the evolving UAP to generate region-based semantic samples without target-domain data. By incorporating input transformations and a sample reweighting scheme guided by KL-divergence, PSP-UAP significantly enhances black-box transferability across diverse CNN architectures on ImageNet, achieving state-of-the-art performance among data-free methods and competitive results relative to data-dependent UAPs. The approach reveals that semantic cues embedded within the UAP can be leveraged as a data substitute, enabling robust universal attacks in data-scarce scenarios. While effective for CNNs, the method shows limited transferability to ViT-based models, pointing to future work on devising black-box strategies for transformer architectures and further defenses against such data-free perturbations.

Abstract

Data-free Universal Adversarial Perturbation (UAP) is an image-agnostic adversarial attack that deceives deep neural networks using a single perturbation generated solely from random noise without relying on data priors. However, traditional data-free UAP methods often suffer from limited transferability due to the absence of semantic content in random noise. To address this issue, we propose a novel data-free universal attack method that recursively extracts pseudo-semantic priors directly from the UAPs during training to enrich the semantic content within the data-free UAP framework. Our approach effectively leverages latent semantic information within UAPs via region sampling, enabling successful input transformations-typically ineffective in traditional data-free UAP methods due to the lack of semantic cues-and significantly enhancing black-box transferability. Furthermore, we introduce a sample reweighting technique to mitigate potential imbalances from random sampling and transformations, emphasizing hard examples less affected by the UAPs. Comprehensive experiments on ImageNet show that our method achieves state-of-the-art performance in average fooling rate by a substantial margin, notably improves attack transferability across various CNN architectures compared to existing data-free UAP methods, and even surpasses data-dependent UAP methods. Code is available at: https://github.com/ChnanChan/PSP-UAP.

Paper Structure

This paper contains 40 sections, 6 equations, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: Diverse semantic contents in both a real-image and UAP: (a) Whole images from the ImageNet dataset (top) and our generated data-free UAP (bottom) using DenseNet-121, shown at iteration 900 during the training phase. The Top-1 class and its score are shown below each image. (b) Cropped regions from the whole image (top) and our UAP (bottom). Those regions contain diverse semantics that differ from the class of the original images.
  • Figure 2: Overall pipeline of the proposed PSP-UAP. The pseudo-semantic prior is created by adding random noise to the UAP. Semantic samples are then generated by randomly cropping and resizing the pseudo-semantic prior. Input transformation is applied to both adversarial and clean versions of the semantic samples to calculate sample reweighting. Finally, the loss is defined as the product of sample reweighting and the activations of the semantic samples, from which gradients are computed to update the model.
  • Figure 3: Semantic samples derived from an adversarial example during training, along with their predicted labels and GradCAM heatmaps from Dense-121. Despite originating from the same example, the variations in predicted labels and heatmaps indicate that these semantic samples capture diverse semantic features.
  • Figure 4: Ablation study on each proposed component in PSP-UAP. RP and PSP refer to training a UAP using random noises and semantic samples drawn from pseudo-semantic prior, respectively. RW and T denote the use of sample reweighting, and input transformation, respectively.
  • Figure 5: Demonstrating that PSP serves as a universal strategy to other data-free UAP methods. Avgerage fooling rate (%) refers average FR (%) on AlexNet, VGG16, VGG19, ResNet152, and GoogleNet, with UAP crafted on ResNet152. P, T, and RW denote PSP, input transformation, and sample reweighting, respectively.
  • ...and 7 more figures