Table of Contents
Fetching ...

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Yanzhang Fu, Zizheng Guo, Jizhou Luo

TL;DR

This work addresses the vulnerability of score-based black-box attacks to adaptive strategies and reveals that prior plug-and-play defenses can be bypassed. It introduces Dashed Line Defense (DLD), a non-smooth post-processing that distorts the attacker’s observed loss in a controlled, label-preserving way, with formal guarantees and ImageNet validation. Theoretical results show that, under a randomized DLD model, standard SQA is exponentially unlikely to succeed, while experiments demonstrate that DLD outperforms prior defenses (AAA and RND) even under worst-case adaptive tactics across multiple architectures. The findings highlight the necessity of accounting for attacker adaptivity in runtime defenses and offer a practical, plug-and-play approach with strong robustness and minimal impact on predictions.

Abstract

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses--even under worst-case adaptive attacks--while preserving the model's predicted labels.

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

TL;DR

This work addresses the vulnerability of score-based black-box attacks to adaptive strategies and reveals that prior plug-and-play defenses can be bypassed. It introduces Dashed Line Defense (DLD), a non-smooth post-processing that distorts the attacker’s observed loss in a controlled, label-preserving way, with formal guarantees and ImageNet validation. Theoretical results show that, under a randomized DLD model, standard SQA is exponentially unlikely to succeed, while experiments demonstrate that DLD outperforms prior defenses (AAA and RND) even under worst-case adaptive tactics across multiple architectures. The findings highlight the necessity of accounting for attacker adaptivity in runtime defenses and offer a practical, plug-and-play approach with strong robustness and minimal impact on predictions.

Abstract

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses--even under worst-case adaptive attacks--while preserving the model's predicted labels.
Paper Structure (23 sections, 2 theorems, 25 equations, 5 figures, 3 tables, 3 algorithms)

This paper contains 23 sections, 2 theorems, 25 equations, 5 figures, 3 tables, 3 algorithms.

Key Result

Theorem 4.1

Under assumptions ass:robust, ass:x0, and ass:iteration, the success probability of standard-SQA$(\mathbf{x}_0,G_{(\mathbf{x}_0,\epsilon_\text{n})})$ against a $(\tau,h,p)$-random-DLD defended $f$ with infinite queries is at most

Figures (5)

  • Figure 1: Under-attack accuracy of four defenses, with blue and orange bars indicating normal and adaptive attacks, respectively. Note that RND reduces accuracy on non-adversarial samples.
  • Figure 2: $\mathcal{D}_{\text{post}}$ of AAA
  • Figure 3: $\mathcal{D}_{\text{post}}$ of $(\tau,h,S)$-DLD with $\tau = 8$, $h = 0.3$, and $S = (0.1,0.2) \cup (0.3,0.4) \cup \dots \cup (0.9,1)$. Although the image may appear as dashed lines, it is actually composed of multiple solid lines.
  • Figure 4: Accuracy under Square Attack with different DLD parameters.
  • Figure 5: Attack success rate (ASR) over attack iterations.

Theorems & Definitions (9)

  • Definition 3.1
  • Definition 3.2: Global Robustness
  • Definition 4.1
  • Definition 4.2
  • Theorem 4.1
  • Remark 4.1
  • Theorem 4.2
  • proof
  • proof