Table of Contents
Fetching ...

On Transfer-based Universal Attacks in Pure Black-box Setting

Mohammad A. A. K. Jalwana, Naveed Akhtar, Ajmal Mian, Nazanin Rahnavard, Mubarak Shah

TL;DR

Problem: existing transfer-based black-box evaluations often assume access to data and label sets, overestimating attack potency. Approach: a prior-free framework trains substitute classifiers on scrapped data with varied class counts and uses a perturbation generator, extended to a robust image-blending method for query-based attacks. Key findings: increasing the number of substitute classes generally improves transferability, and priors inflate reported fooling rates; distributional-noise perturbations outperform fixed-noise variants. Significance: the framework enables transparent threat assessment in pure black-box settings and provides practical tools for both evaluating and extending transferable attacks.

Abstract

Despite their impressive performance, deep visual models are susceptible to transferable black-box adversarial attacks. Principally, these attacks craft perturbations in a target model-agnostic manner. However, surprisingly, we find that existing methods in this domain inadvertently take help from various priors that violate the black-box assumption such as the availability of the dataset used to train the target model, and the knowledge of the number of classes in the target model. Consequently, the literature fails to articulate the true potency of transferable black-box attacks. We provide an empirical study of these biases and propose a framework that aids in a prior-free transparent study of this paradigm. Using our framework, we analyze the role of prior knowledge of the target model data and number of classes in attack performance. We also provide several interesting insights based on our analysis, and demonstrate that priors cause overestimation in transferability scores. Finally, we extend our framework to query-based attacks. This extension inspires a novel image-blending technique to prepare data for effective surrogate model training.

On Transfer-based Universal Attacks in Pure Black-box Setting

TL;DR

Problem: existing transfer-based black-box evaluations often assume access to data and label sets, overestimating attack potency. Approach: a prior-free framework trains substitute classifiers on scrapped data with varied class counts and uses a perturbation generator, extended to a robust image-blending method for query-based attacks. Key findings: increasing the number of substitute classes generally improves transferability, and priors inflate reported fooling rates; distributional-noise perturbations outperform fixed-noise variants. Significance: the framework enables transparent threat assessment in pure black-box settings and provides practical tools for both evaluating and extending transferable attacks.

Abstract

Despite their impressive performance, deep visual models are susceptible to transferable black-box adversarial attacks. Principally, these attacks craft perturbations in a target model-agnostic manner. However, surprisingly, we find that existing methods in this domain inadvertently take help from various priors that violate the black-box assumption such as the availability of the dataset used to train the target model, and the knowledge of the number of classes in the target model. Consequently, the literature fails to articulate the true potency of transferable black-box attacks. We provide an empirical study of these biases and propose a framework that aids in a prior-free transparent study of this paradigm. Using our framework, we analyze the role of prior knowledge of the target model data and number of classes in attack performance. We also provide several interesting insights based on our analysis, and demonstrate that priors cause overestimation in transferability scores. Finally, we extend our framework to query-based attacks. This extension inspires a novel image-blending technique to prepare data for effective surrogate model training.

Paper Structure

This paper contains 8 sections, 1 equation, 6 figures, 8 tables.

Figures (6)

  • Figure 1: High-resolution localization of salient semantic regions with different number of classes using CAMERAS jalwana2021cameras. The difference in classes leads to significantly different learning of semantic concepts. Higher number of classes in classification models (CMs) allows better identification of the salient regions.
  • Figure 2: Schematics of the proposed framework for studying the role of different priors in the transferability of adversarial perturbations. The left hand side illustrates training of substitute models with any choice of architecture and number of classes over a scrapped training data. These models, in ensembles, in-turn train perturbation algorithms to craft perturbations for the scrapped data. The right-hand side shows testing of perturbations performed on models trained with different dataset and labels. Depiction of target model as pretrained ImageNet models is for illustration only.
  • Figure 3: Architectures of two variants of perturbation generators used in analysis. Top: Publicly available generator poursaeed2018generative. Bottom: Our light-weight version. To indicate the differences, output sizes of different set of layers are also shown.
  • Figure 4: Generating new images by linear combination of input images. The left hand-side shows sample from 'warplane' and 'tank' classes that are linearly weighted and combined to generate samples on the right-hand side.
  • Figure 5: Blending an image of 'aircraft carrier' to 'trailer truck' by adding perturbations computed by targeted PGD attack over robust ResNet-18 model. The left most image is the clean image, while the right image iteratively increase the norm ($\ell_{\infty}$) of perturbation gradually to a maximum value of $50/255$.
  • ...and 1 more figures