Table of Contents
Fetching ...

Hard-label based Small Query Black-box Adversarial Attack

Jeonghwan Park, Paul Miller, Niall McLaughlin

TL;DR

The paper targets the inefficiency of hard-label black-box adversarial attacks, proposing SQBA, a practical method that guides query-efficient optimization by combining surrogate-model gradients with a Monte Carlo-based gradient refinement. By formalizing a hard-label attack objective and introducing a dual-gradient framework that enforces boundary proximity, SQBA achieves substantially higher attack success rates at small query budgets across CIFAR-10 and ImageNet benchmarks, including defended models. The work demonstrates the benefits of transfer-based guidance in hard-label settings while acknowledging limitations when surrogate-target gradient directions disagree, and it provides a reproducible implementation via supplemental material and code. This approach offers a scalable, efficient avenue for evaluating model robustness in realistic black-box scenarios.

Abstract

We consider the hard label based black box adversarial attack setting which solely observes predicted classes from the target model. Most of the attack methods in this setting suffer from impractical number of queries required to achieve a successful attack. One approach to tackle this drawback is utilising the adversarial transferability between white box surrogate models and black box target model. However, the majority of the methods adopting this approach are soft label based to take the full advantage of zeroth order optimisation. Unlike mainstream methods, we propose a new practical setting of hard label based attack with an optimisation process guided by a pretrained surrogate model. Experiments show the proposed method significantly improves the query efficiency of the hard label based black-box attack across various target model architectures. We find the proposed method achieves approximately 5 times higher attack success rate compared to the benchmarks, especially at the small query budgets as 100 and 250.

Hard-label based Small Query Black-box Adversarial Attack

TL;DR

The paper targets the inefficiency of hard-label black-box adversarial attacks, proposing SQBA, a practical method that guides query-efficient optimization by combining surrogate-model gradients with a Monte Carlo-based gradient refinement. By formalizing a hard-label attack objective and introducing a dual-gradient framework that enforces boundary proximity, SQBA achieves substantially higher attack success rates at small query budgets across CIFAR-10 and ImageNet benchmarks, including defended models. The work demonstrates the benefits of transfer-based guidance in hard-label settings while acknowledging limitations when surrogate-target gradient directions disagree, and it provides a reproducible implementation via supplemental material and code. This approach offers a scalable, efficient avenue for evaluating model robustness in realistic black-box scenarios.

Abstract

We consider the hard label based black box adversarial attack setting which solely observes predicted classes from the target model. Most of the attack methods in this setting suffer from impractical number of queries required to achieve a successful attack. One approach to tackle this drawback is utilising the adversarial transferability between white box surrogate models and black box target model. However, the majority of the methods adopting this approach are soft label based to take the full advantage of zeroth order optimisation. Unlike mainstream methods, we propose a new practical setting of hard label based attack with an optimisation process guided by a pretrained surrogate model. Experiments show the proposed method significantly improves the query efficiency of the hard label based black-box attack across various target model architectures. We find the proposed method achieves approximately 5 times higher attack success rate compared to the benchmarks, especially at the small query budgets as 100 and 250.
Paper Structure (18 sections, 17 equations, 3 figures, 4 tables, 3 algorithms)

This paper contains 18 sections, 17 equations, 3 figures, 4 tables, 3 algorithms.

Figures (3)

  • Figure 1: Illustration of the process to find iterate example search direction.
  • Figure 2: Change of Angle of the gradient vectors from surrogate model.
  • Figure 3: Examples of targeted attack. DGM $l_{2}$ attack is applied to the CIFAR-10 dataset performing the targeted attack for each source/target pair. First column is the clean images