Table of Contents
Fetching ...

The Adaptive Arms Race: Redefining Robustness in AI Security

Ilias Tsingenopoulos, Vera Rimmer, Davy Preuveneers, Fabio Pierazzi, Lorenzo Cavallaro, Wouter Joosen

TL;DR

The paper addresses the challenge of robustness for AI systems exposed to decision-based, black-box attacks, arguing that existing defenses and non-adaptive evaluations give a misleading sense of security. It introduces Adversarial Markov Games (AMG) to model the turn-taking dynamics between adaptive attacks and defenses and employs self-adaptive reinforcement learning to optimize both sides, including active defenses that misdirect or reject. The authors provide a theoretical framework and extensive empirical evidence showing that self-adaptive attacks can outperform state-of-the-art baselines and that adaptive defenses can significantly impede them, revealing an ongoing arms race in robustness evaluation. The work has practical impact by enabling more realistic robustness assessments for deployed ML systems and by offering open-source tooling to study adaptive interactions across domains beyond image classification.

Abstract

Despite considerable efforts on making them robust, real-world AI-based systems remain vulnerable to decision based attacks, as definitive proofs of their operational robustness have so far proven intractable. Canonical robustness evaluation relies on adaptive attacks, which leverage complete knowledge of the defense and are tailored to bypass it. This work broadens the notion of adaptivity, which we employ to enhance both attacks and defenses, showing how they can benefit from mutual learning through interaction. We introduce a framework for adaptively optimizing black-box attacks and defenses under the competitive game they form. To assess robustness reliably, it is essential to evaluate against realistic and worst-case attacks. We thus enhance attacks and their evasive arsenal together using RL, apply the same principle to defenses, and evaluate them first independently and then jointly under a multi-agent perspective. We find that active defenses, those that dynamically control system responses, are an essential complement to model hardening against decision-based attacks; that these defenses can be circumvented by adaptive attacks, something that elicits defenses being adaptive too. Our findings, supported by an extensive theoretical and empirical investigation, confirm that adaptive adversaries pose a serious threat to black-box AI-based systems, rekindling the proverbial arms race. Notably, our approach outperforms the state-of-the-art black-box attacks and defenses, while bringing them together to render effective insights into the robustness of real-world deployed ML-based systems.

The Adaptive Arms Race: Redefining Robustness in AI Security

TL;DR

The paper addresses the challenge of robustness for AI systems exposed to decision-based, black-box attacks, arguing that existing defenses and non-adaptive evaluations give a misleading sense of security. It introduces Adversarial Markov Games (AMG) to model the turn-taking dynamics between adaptive attacks and defenses and employs self-adaptive reinforcement learning to optimize both sides, including active defenses that misdirect or reject. The authors provide a theoretical framework and extensive empirical evidence showing that self-adaptive attacks can outperform state-of-the-art baselines and that adaptive defenses can significantly impede them, revealing an ongoing arms race in robustness evaluation. The work has practical impact by enabling more realistic robustness assessments for deployed ML systems and by offering open-source tooling to study adaptive interactions across domains beyond image classification.

Abstract

Despite considerable efforts on making them robust, real-world AI-based systems remain vulnerable to decision based attacks, as definitive proofs of their operational robustness have so far proven intractable. Canonical robustness evaluation relies on adaptive attacks, which leverage complete knowledge of the defense and are tailored to bypass it. This work broadens the notion of adaptivity, which we employ to enhance both attacks and defenses, showing how they can benefit from mutual learning through interaction. We introduce a framework for adaptively optimizing black-box attacks and defenses under the competitive game they form. To assess robustness reliably, it is essential to evaluate against realistic and worst-case attacks. We thus enhance attacks and their evasive arsenal together using RL, apply the same principle to defenses, and evaluate them first independently and then jointly under a multi-agent perspective. We find that active defenses, those that dynamically control system responses, are an essential complement to model hardening against decision-based attacks; that these defenses can be circumvented by adaptive attacks, something that elicits defenses being adaptive too. Our findings, supported by an extensive theoretical and empirical investigation, confirm that adaptive adversaries pose a serious threat to black-box AI-based systems, rekindling the proverbial arms race. Notably, our approach outperforms the state-of-the-art black-box attacks and defenses, while bringing them together to render effective insights into the robustness of real-world deployed ML-based systems.
Paper Structure (20 sections, 4 theorems, 22 equations, 4 figures, 7 tables)

This paper contains 20 sections, 4 theorems, 22 equations, 4 figures, 7 tables.

Key Result

Proposition 3.1

Let $F_c$ denote the discriminant function of an adversarially trained model $\mathcal{M}$, and let $C(x) = D(F_c(x))$ denote its classifier. Then in HSJA, to satisfy $\mathbb{E}[\sum_{i=1}^N d(x_b^i, x_c^i)] \geq \epsilon$ it is necessary that: (a) $D \neq \operatorname*{arg\,max}$, and (b) context

Figures (4)

  • Figure 1: In AML, adaptive attacks are those with the capabilities (knobs) to bypass a defense; adaptive control is rather the precise tuning of all the known knobs. Against black-box systems, we can reformulate adaptive so that it signifies both. For instance in HSJA chen2020hopskipjumpattack, radius, steps, and jump are parameters of the attack, while rotate and translate are transformations that can evade a similarity-based defense.
  • Figure 2: Schematic model of an AMG environment. Due to the inherent uncertainty of behavior at either side of the interface, it is a partially observable MDP, mirrored for each agent where one's decisions become the other's observations. (I) denotes an adaptive attacker (cf. Fig. \ref{['fig:adaptivity']}), (II) model hardening (passive defense), and (III) an active defense.
  • Figure 3: Misdirection in a hypothetical 2D decision boundary. The adaptive defense controls a single parameter, the hypersphere radius around $k_0$ (the last known adversarial); for queries $x_t$ that fall within this hypersphere the model responds with a non-adversarial decision. $x_g$ is the starting sample, $x_c$ the original, and $x_b$ the best possible adversarial.
  • Figure 4: Progression of ASR over successive adaptations. Red and green values in ASR denote offensive and defensive scenarios respectively.

Theorems & Definitions (8)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3: Adversarial Policy Gradient
  • Corollary 3.4
  • proof
  • proof
  • proof
  • proof