Table of Contents
Fetching ...

RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

Mohamed Djilani, Salah Ghamizi, Maxime Cordy

TL;DR

RobustBlack investigates how black-box adversarial attacks fare against state-of-the-art defenses on ImageNet, addressing a gap where prior evaluations focused on weaker defenses. It proposes a framework to test score-based black-box attacks against RobustBench defenses, revealing that simple adversarial training already dramatically reduces black-box success and that defenses tuned against AutoAttack often generalize to black-box settings. The study further shows that the surrogate-target robustness alignment governs transfer-based attack effectiveness and that using robust surrogates can either hinder or boost attack success depending on the target's robustness. These findings inform defense evaluation practices and motivate the development of more realistic, defense-aware black-box attack benchmarks for deploying robust AI systems at scale.

Abstract

Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap in the evaluation of their effectiveness against more recent and moderate robust models (e.g., those featured in the Robustbench leaderboard). In this paper, we question this lack of attention from black-box attacks to robust models. We establish a framework to evaluate the effectiveness of recent black-box attacks against both top-performing and standard defense mechanisms, on the ImageNet dataset. Our empirical evaluation reveals the following key findings: (1) the most advanced black-box attacks struggle to succeed even against simple adversarially trained models; (2) robust models that are optimized to withstand strong white-box attacks, such as AutoAttack, also exhibits enhanced resilience against black-box attacks; and (3) robustness alignment between the surrogate models and the target model plays a key factor in the success rate of transfer-based attacks

RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

TL;DR

RobustBlack investigates how black-box adversarial attacks fare against state-of-the-art defenses on ImageNet, addressing a gap where prior evaluations focused on weaker defenses. It proposes a framework to test score-based black-box attacks against RobustBench defenses, revealing that simple adversarial training already dramatically reduces black-box success and that defenses tuned against AutoAttack often generalize to black-box settings. The study further shows that the surrogate-target robustness alignment governs transfer-based attack effectiveness and that using robust surrogates can either hinder or boost attack success depending on the target's robustness. These findings inform defense evaluation practices and motivate the development of more realistic, defense-aware black-box attack benchmarks for deploying robust AI systems at scale.

Abstract

Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap in the evaluation of their effectiveness against more recent and moderate robust models (e.g., those featured in the Robustbench leaderboard). In this paper, we question this lack of attention from black-box attacks to robust models. We establish a framework to evaluate the effectiveness of recent black-box attacks against both top-performing and standard defense mechanisms, on the ImageNet dataset. Our empirical evaluation reveals the following key findings: (1) the most advanced black-box attacks struggle to succeed even against simple adversarially trained models; (2) robust models that are optimized to withstand strong white-box attacks, such as AutoAttack, also exhibits enhanced resilience against black-box attacks; and (3) robustness alignment between the surrogate models and the target model plays a key factor in the success rate of transfer-based attacks
Paper Structure (26 sections, 5 figures, 13 tables)

This paper contains 26 sections, 5 figures, 13 tables.

Figures (5)

  • Figure 1: The blue bars (with error bars) show the success rates and standard deviations for the vanilla ResNet50 model, while the orange bars (with error bars) show the results for the robust ResNet50 model.
  • Figure 2: Success rate of blackbox attacks against SoTA defenses.
  • Figure 3: Relation between success rate of AutoAttack and the success rate of black-box attacks.
  • Figure 4: Success rates of black-box attacks using vanilla and robust surrogates against a vanilla target and robust targets.
  • Figure 5: The blue bars show the success rate for the vanilla ResNet50 model, while the orange bars show the results for the robust ResNet50 model, and the green bars show the success rate for the ResNet50 model by quadrupling the budget of the attacks.