Table of Contents
Fetching ...

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

Jianbo Chen, Michael I. Jordan, Martin J. Wainwright

TL;DR

Decision-based attacks restrict access to only predicted labels; this paper introduces HopSkipJumpAttack, a gradient-direction estimator at the decision boundary combined with boundary-search and geometric-step updates to achieve efficient perturbations under $\ell_2$ and $\ell_\infty$. The method provides theoretical analysis of the gradient estimator, convergence, and variance-control techniques, and demonstrates substantial query-efficiency improvements over Boundary Attack across MNIST, CIFAR, and ImageNet, including robustness against several defenses. Empirical results show HSJA significantly reduces required queries while producing competitive perturbations, illustrating practical threat-model relevance and establishing a strong baseline for defense evaluation.

Abstract

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for $\ell_2$ and $\ell_\infty$ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than Boundary Attack. It also achieves competitive performance in attacking several widely-used defense mechanisms. (HopSkipJumpAttack was named Boundary Attack++ in a previous version of the preprint.)

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

TL;DR

Decision-based attacks restrict access to only predicted labels; this paper introduces HopSkipJumpAttack, a gradient-direction estimator at the decision boundary combined with boundary-search and geometric-step updates to achieve efficient perturbations under and . The method provides theoretical analysis of the gradient estimator, convergence, and variance-control techniques, and demonstrates substantial query-efficiency improvements over Boundary Attack across MNIST, CIFAR, and ImageNet, including robustness against several defenses. Empirical results show HSJA significantly reduces required queries while producing competitive perturbations, illustrating practical threat-model relevance and establishing a strong baseline for defense evaluation.

Abstract

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for and similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than Boundary Attack. It also achieves competitive performance in attacking several widely-used defense mechanisms. (HopSkipJumpAttack was named Boundary Attack++ in a previous version of the preprint.)

Paper Structure

This paper contains 35 sections, 3 theorems, 63 equations, 12 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Under the previously stated conditions on $S_{x^\star}$, suppose that we compute the updates eq:update_l2 with step size $\xi_t = \|x_t - {x^\star} \|_2 t^{-q}$ for some $q \in \left(\frac{1}{2}, 1 \right)$. Then there is a universal constant $c$ such that In particular, the algorithm converges to a stationary point of problem prob:constrained.

Figures (12)

  • Figure 1: An illustration of accessible components of the target model for each of the three threat models. A white-box threat model assumes access to the whole model; a score-based threat model assumes access to the output layer; a decision-based threat model assumes access to the predicted label alone.
  • Figure 2: Intuitive explanation of HopSkipJumpAttack. (a) Perform a binary search to find the boundary, and then update $\tilde{x}_t \to x_t$. (b) Estimate the gradient at the boundary point $x_t$. (c) Geometric progression and then update $x_t \to \tilde{x}_{t+1}$. (d) Perform a binary search, and then update $\tilde{x}_{t+1}\to x_{t+1}$.
  • Figure 3: Median distance versus number of model queries on MNIST with CNN, and CIFAR-10 with ResNet and DenseNet from top to bottom rows. 1st column: untargeted $\ell_2$. 2nd col.: targeted $\ell_2$. 3rd col.: untargeted $\ell_\infty$. 4th col.: targeted $\ell_\infty$.
  • Figure 4: Median distance versus number of model queries on CIFAR-100 with ResNet, DenseNet, and ImageNet with ResNet from top to bottom rows. 1st column: untargeted $\ell_2$. 2nd col.: targeted $\ell_2$. 3rd col.: untargeted $\ell_\infty$. 4th col.: targeted $\ell_\infty$.
  • Figure 5: Success rate versus distance threshold for MNIST with CNN, and CIFAR-10 with ResNet, DenseNet from top to bottom rows. 1st column: untargeted $\ell_2$. 2nd column: targeted $\ell_2$. 3rd column: untargeted $\ell_\infty$. 4th column: targeted $\ell_\infty$.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof