Table of Contents
Fetching ...

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

TL;DR

This paper addresses the hard problem of sparse adversarial perturbations under score-based black-box queries by reformulating the search into a lower-dimensional discrete space using a fixed synthetic color image $x'$ and a binary mask $u$. A Bayesian framework with a Dirichlet-parameterized search distribution learns pixel-level influence from history, guiding iterative sampling biased by a dissimilarity map to efficiently seek an $l_0$-constrained perturbation. The resulting BrusLeAttack achieves state-of-the-art attack success rates and query efficiency on ImageNet across CNNs and ViTs, and demonstrates practical impact by successfully attacking a real-world MLaaS system (Google Cloud Vision) and evaluating defenses. Overall, the method provides a scalable, principled approach for rapid vulnerability assessment of vision models in black-box settings, with artifacts available on GitHub to facilitate reproducibility and further study.

Abstract

We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

TL;DR

This paper addresses the hard problem of sparse adversarial perturbations under score-based black-box queries by reformulating the search into a lower-dimensional discrete space using a fixed synthetic color image and a binary mask . A Bayesian framework with a Dirichlet-parameterized search distribution learns pixel-level influence from history, guiding iterative sampling biased by a dissimilarity map to efficiently seek an -constrained perturbation. The resulting BrusLeAttack achieves state-of-the-art attack success rates and query efficiency on ImageNet across CNNs and ViTs, and demonstrates practical impact by successfully attacking a real-world MLaaS system (Google Cloud Vision) and evaluating defenses. Overall, the method provides a scalable, principled approach for rapid vulnerability assessment of vision models in black-box settings, with artifacts available on GitHub to facilitate reproducibility and further study.

Abstract

We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
Paper Structure (39 sections, 10 equations, 17 figures, 27 tables, 4 algorithms)

This paper contains 39 sections, 10 equations, 17 figures, 27 tables, 4 algorithms.

Figures (17)

  • Figure 1: Targeted Attack. Malicious instances are generated by BrusLeAttack with different perturbation budgets against three Deep Learning models on ImageNet. An image with ground-truth label Minibus is misclassified as a Warplane. Interestingly, in contrast to needing 220 pixels to mislead the Vision Transformer, BrusLeAttack requires only 80 perturbed pixels to fool ResNet-based models (more visuals in Appendix\ref{['apdx:Visualization of Sparse Adversarial Examples']}). Evaluation against Google Cloud Vision is in Section\ref{['subsec:Attack a Real-World System']} and Appendix\ref{['apdx:Visualizations of Attack Against Google Cloud Vision']}.
  • Figure 2: A Sampling and Update illustration. The attack aims to mislead a model into misclassifying a Bird image as Dog. Assuming that in round $t-1$, an adversarial instance is classified as Bird and loss $\ell=4.8$. We visualize three elements of $\boldsymbol{\alpha}^{\text{posterior}}$ for simplicity. Let $\{p_1,p_2,p_3\}$ denote three perturbed pixels with corresponding posterior parameters $\{{\alpha}_1^{\text{posterior}},{\alpha}_2^{\text{posterior}},{\alpha}_3^{\text{posterior}}\}$. Assume that in round $t$, two pixels $p_1, p_2$ remain while $p_3$ is replaced by $p_4$ because a loss reduction is observed from 4.8 to 1.9. All $\{{\alpha}_1^{\text{posterior}},{\alpha}_2^{\text{posterior}},{\alpha}_3^{\text{posterior}}, {\alpha}_4^{\text{posterior}}\}$ are updated using Equation \ref{['eq:posterior mean']} but we visualize $\{{\alpha}_1^{\text{posterior}},{\alpha}_2^{\text{posterior}},{\alpha}_4^{\text{posterior}}\}$. Since ${\alpha}_4^{\text{posterior}}$ is new and has never been selected before, it is small in value (and represented using colder colors). From $t$ to $t+45$, while sampling and learning to find a better group of perturbed pixels, $\boldsymbol{\alpha}^{\text{posterior}}$ is updated. Because $p_1$ has a high influence on the model's prediction (represented using warmer colors), it is more likely to remain, while $p_2, p_4$ are more likely to be selected for a replacement due to their lower impact on the model decision. In round $t+46$, pixel $p_2$ is replaced by $p_5$ because a loss reduction is observed from 1.9 to 0.6. Now, the predicted label is flipped from Bird to Dog.
  • Figure 3: BrusLeAttack algorithm (Algo. \ref{['algo:main']}). We aim to search for a set of pixels to replace in the source image $x$ by corresponding pixels in a synthetic color image $x^\prime$. In the solution, binary matrix$\boldsymbol{u^{(t)}}$, white and black colors denote replaced and non-replaced pixels of the source image, respectively. Instead of a stochastic search, we employ our Bayesian framework in $\S$\ref{['sec:Bayesian Formulation for learning a model']}. First, we aim to retain useful elements in the solution $\boldsymbol{u^{(t)}}$ by learning from historical pixel manipulations. For this, we explore and learn the influence of selected elements by capturing it in the model $\boldsymbol{\theta}$ using our general Bayesian framework in $\S$\ref{['sec:Bayesian Formulation for learning a model']}---darker colors illustrate the higher influence of selected elements (Algo. \ref{['algo:Update']}). Second, we generate new pixel perturbations based on $\boldsymbol{\theta}$ with the intuition that a larger pixel dissimilarity $M$ between our search space $x^\prime$ and a source image can possibly move the adversarial to the decision boundary faster and accelerate the search (Algo. \ref{['algo:generation']}).
  • Figure 4: Targeted setting on $\texttt{ImageNet}$. a-c) ASR of BrusLeAttack and Sparse-RS against different models at sparsity levels of 0.4$\%$ (dashed lines) and 1.0$\%$ (solid lines); d) Accuracy of different models against BrusLeAttack at sparsity levels (0.4$\%$ dash, 1.0$\%$ solid; in-between sparsity levels in Appendix \ref{['apdx:Attack against DL modes on ImageNet-targeted setting']}).
  • Figure 5: Targeted attacks on the ImageNet task against ResNet-50. ASR comparisons between BrusLeAttack and baselines: i) SparseEvo and Pointwise (SOTA algorithms from decision-based settings); ii) $\text{PGD}_0$ (whitebox).
  • ...and 12 more figures