XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

Kiana Vu; Phung Lai; Truc Nguyen

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

Kiana Vu, Phung Lai, Truc Nguyen

TL;DR

A novel explanation-driven adversarial attack against blackbox classifiers based on feature substitution, called XSub, which is to strategically replace important features in the original sample with corresponding important features of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample.

Abstract

Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a "golden sample" of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the original samples information is replaced. This flexibility effectively balances a trade-off between the attacks effectiveness and its stealthiness. XSub is also highly cost-effective in that the number of required queries to the prediction model and the explanation model in conducting the attack is in O(1). In addition, XSub can be easily extended to launch backdoor attacks in case the attacker has access to the models training data. Our evaluation demonstrates that XSub is not only effective and stealthy but also cost-effective, enabling its application across a wide range of AI models.

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

TL;DR

Abstract

Paper Structure (17 sections, 4 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 4 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Related work
Preliminaries
Model Explanations
Threat Model
XSub: Explanation-Driven Adversarial Attack with Feature Substitution
Golden Sample Selection
Explanation-Driven Substitution
Explanation-Driven Adversarial Attack
Extension to Explanation-Driven Backdoor Attack
Summary of XSub Novelty and Benefits
Experiments
Baselines and ML Explainer
Datasets and Model Configurations
Evaluation metrics
...and 2 more sections

Figures (7)

Figure 1: Examples of golden samples from the CIFAR-10 dataset krizhevsky2009learning (upper row images) and the Imagenette dataset Howard_Imagenette_2019 (lower row images).
Figure 2: XSub with varying values of $K$ ($\alpha=\beta=100$).
Figure 3: The framework of our proposed attack XSub.
Figure 4: Attack SR at different values of $\alpha$ and $\beta$ ($K=1$).
Figure 5: Attack SR at different values of $K$ ($\alpha =1$) in the CIFAR-10 dataset.
...and 2 more figures

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

TL;DR

Abstract

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

Authors

TL;DR

Abstract

Table of Contents

Figures (7)