Table of Contents
Fetching ...

Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings

Arjhun Swaminathan, Mete Akgün

TL;DR

The paper tackles the challenge of crafting targeted adversarial examples in hard-label black-box settings with very limited queries. It introduces Targeted Edge-Informed Attack (TEA), a two-stage method that first uses a Sobel-derived soft edge mask to perform a global, edge-preserving perturbation toward a source image, then refines locally with patch-based updates, delaying reliance on local decision-boundary geometry until necessary. TEA demonstrates substantial query efficiency and robustness across ImageNet models, achieving roughly 70% fewer queries than prior state-of-the-art methods to reach meaningful distance reductions, and further benefits from switching to CGBA-H in higher-query regimes. The approach underscores the value of leveraging intrinsic target-image structure (edges) for efficient hard-label attacks and offers practical implications for both attackers and defenders in real-world, low-query scenarios.

Abstract

Deep neural networks for image classification remain vulnerable to adversarial examples -- small, imperceptible perturbations that induce misclassifications. In black-box settings, where only the final prediction is accessible, crafting targeted attacks that aim to misclassify into a specific target class is particularly challenging due to narrow decision regions. Current state-of-the-art methods often exploit the geometric properties of the decision boundary separating a source image and a target image rather than incorporating information from the images themselves. In contrast, we propose Targeted Edge-informed Attack (TEA), a novel attack that utilizes edge information from the target image to carefully perturb it, thereby producing an adversarial image that is closer to the source image while still achieving the desired target classification. Our approach consistently outperforms current state-of-the-art methods across different models in low query settings (nearly 70% fewer queries are used), a scenario especially relevant in real-world applications with limited queries and black-box access. Furthermore, by efficiently generating a suitable adversarial example, TEA provides an improved target initialization for established geometry-based attacks.

Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings

TL;DR

The paper tackles the challenge of crafting targeted adversarial examples in hard-label black-box settings with very limited queries. It introduces Targeted Edge-Informed Attack (TEA), a two-stage method that first uses a Sobel-derived soft edge mask to perform a global, edge-preserving perturbation toward a source image, then refines locally with patch-based updates, delaying reliance on local decision-boundary geometry until necessary. TEA demonstrates substantial query efficiency and robustness across ImageNet models, achieving roughly 70% fewer queries than prior state-of-the-art methods to reach meaningful distance reductions, and further benefits from switching to CGBA-H in higher-query regimes. The approach underscores the value of leveraging intrinsic target-image structure (edges) for efficient hard-label attacks and offers practical implications for both attackers and defenders in real-world, low-query scenarios.

Abstract

Deep neural networks for image classification remain vulnerable to adversarial examples -- small, imperceptible perturbations that induce misclassifications. In black-box settings, where only the final prediction is accessible, crafting targeted attacks that aim to misclassify into a specific target class is particularly challenging due to narrow decision regions. Current state-of-the-art methods often exploit the geometric properties of the decision boundary separating a source image and a target image rather than incorporating information from the images themselves. In contrast, we propose Targeted Edge-informed Attack (TEA), a novel attack that utilizes edge information from the target image to carefully perturb it, thereby producing an adversarial image that is closer to the source image while still achieving the desired target classification. Our approach consistently outperforms current state-of-the-art methods across different models in low query settings (nearly 70% fewer queries are used), a scenario especially relevant in real-world applications with limited queries and black-box access. Furthermore, by efficiently generating a suitable adversarial example, TEA provides an improved target initialization for established geometry-based attacks.

Paper Structure

This paper contains 9 sections, 6 equations, 8 figures, 8 tables, 3 algorithms.

Figures (8)

  • Figure 1: Overview of Patch-Based Edge-Informed Search. Edge information from the target image, obtained via the Sobel operator, is first blurred to generate a soft edge mask. A square patch is then selected and a Gaussian weighting function is applied. In the bottom right panel, the intensity of the modification is illustrated: dark red regions remain largely unchanged, while light green regions receive a more pronounced update. The lack of changes near the patch borders helps prevent the introduction of artificial edges.
  • Figure 2: Visualization of TEA on a source–target image pair. The target image (initially classified as Bee-eater) is perturbed to resemble the source image (classified as Spoonbill), while preserving its original Bee-eater label. Global Edge-Informed Search efficiently applies edge-aware perturbations using only $15$ queries to achieve a $\approx$20% reduction in distance to the source. Patch-Based Edge-Informed Search introduces localized, edge-aware modifications to small image regions, as seen in the hotspot of changes. Further refinement utilizing CGBA-H is illustrated in the narrow decision space.
  • Figure 3: Average $\ell_2$ distance reduction across different architectures in a low-query regime. Higher values indicate improved performance.
  • Figure 4: Comparison of ASR of 50% distance reduction. Higher values indicate that a higher proportion of images reach a distance reduction of 50% sooner.
  • Figure 5: Comparison of ASR of 75% distance reduction. Higher values indicate that a higher proportion of images reach a distance reduction of 75% sooner.
  • ...and 3 more figures