The Adaptive Arms Race: Redefining Robustness in AI Security
Ilias Tsingenopoulos, Vera Rimmer, Davy Preuveneers, Fabio Pierazzi, Lorenzo Cavallaro, Wouter Joosen
TL;DR
The paper addresses the challenge of robustness for AI systems exposed to decision-based, black-box attacks, arguing that existing defenses and non-adaptive evaluations give a misleading sense of security. It introduces Adversarial Markov Games (AMG) to model the turn-taking dynamics between adaptive attacks and defenses and employs self-adaptive reinforcement learning to optimize both sides, including active defenses that misdirect or reject. The authors provide a theoretical framework and extensive empirical evidence showing that self-adaptive attacks can outperform state-of-the-art baselines and that adaptive defenses can significantly impede them, revealing an ongoing arms race in robustness evaluation. The work has practical impact by enabling more realistic robustness assessments for deployed ML systems and by offering open-source tooling to study adaptive interactions across domains beyond image classification.
Abstract
Despite considerable efforts on making them robust, real-world AI-based systems remain vulnerable to decision based attacks, as definitive proofs of their operational robustness have so far proven intractable. Canonical robustness evaluation relies on adaptive attacks, which leverage complete knowledge of the defense and are tailored to bypass it. This work broadens the notion of adaptivity, which we employ to enhance both attacks and defenses, showing how they can benefit from mutual learning through interaction. We introduce a framework for adaptively optimizing black-box attacks and defenses under the competitive game they form. To assess robustness reliably, it is essential to evaluate against realistic and worst-case attacks. We thus enhance attacks and their evasive arsenal together using RL, apply the same principle to defenses, and evaluate them first independently and then jointly under a multi-agent perspective. We find that active defenses, those that dynamically control system responses, are an essential complement to model hardening against decision-based attacks; that these defenses can be circumvented by adaptive attacks, something that elicits defenses being adaptive too. Our findings, supported by an extensive theoretical and empirical investigation, confirm that adaptive adversaries pose a serious threat to black-box AI-based systems, rekindling the proverbial arms race. Notably, our approach outperforms the state-of-the-art black-box attacks and defenses, while bringing them together to render effective insights into the robustness of real-world deployed ML-based systems.
