Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors

Hangsheng Zhang; Dongqi Han; Yinlong Liu; Zhiliang Wang; Jiyan Sun; Shangyuan Zhuang; Jiqiang Liu; Jinsong Dong

Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors

Hangsheng Zhang, Dongqi Han, Yinlong Liu, Zhiliang Wang, Jiyan Sun, Shangyuan Zhuang, Jiqiang Liu, Jinsong Dong

TL;DR

This work tackles the vulnerability of ML-based NIDSs to adversarial examples under realistic black-box conditions. It introduces ETA, a generic transfer-based attack framework that leverages an ensemble substitute model and Important-Sensitive Feature Selection (ISFS) to craft adversarial traffic that remains valid under traffic-space constraints, while providing explanations via cooperative game theory and perturbation-based interpretation. The key contributions are: (i) a cross-model transferable attack architecture combining differentiable and non-differentiable substitutes, (ii) a zeroth-order gradient evaluation strategy for non-differentiable substitutes, (iii) ISFS that links non-robust features to adversarial existence and transferability, and (iv) extensive experiments showing ~70% average transfer success across diverse NIDSs, datasets, and constraints, plus real-environment validation and interpretive analyses. The findings underscore the security risks of feature-centric NIDS designs and the importance of focusing on robust, intrinsic traffic features to improve robustness, with broader implications for securing ML-enabled security systems.

Abstract

espite being widely used in network intrusion detection systems (NIDSs), machine learning (ML) has proven to be highly vulnerable to adversarial attacks. White-box and black-box adversarial attacks of NIDS have been explored in several studies. However, white-box attacks unrealistically assume that the attackers have full knowledge of the target NIDSs. Meanwhile, existing black-box attacks can not achieve high attack success rate due to the weak adversarial transferability between models (e.g., neural networks and tree models). Additionally, neither of them explains why adversarial examples exist and why they can transfer across models. To address these challenges, this paper introduces ETA, an Explainable Transfer-based Black-Box Adversarial Attack framework. ETA aims to achieve two primary objectives: 1) create transferable adversarial examples applicable to various ML models and 2) provide insights into the existence of adversarial examples and their transferability within NIDSs. Specifically, we first provide a general transfer-based adversarial attack method applicable across the entire ML space. Following that, we exploit a unique insight based on cooperative game theory and perturbation interpretations to explain adversarial examples and adversarial transferability. On this basis, we propose an Important-Sensitive Feature Selection (ISFS) method to guide the search for adversarial examples, achieving stronger transferability and ensuring traffic-space constraints.

Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors

TL;DR

Abstract

Paper Structure (61 sections, 5 equations, 7 figures, 7 tables, 3 algorithms)

This paper contains 61 sections, 5 equations, 7 figures, 7 tables, 3 algorithms.

introduction
background and related work
ML-based NIDSs
Statistical feature-based Models
Temporal feature-based Models
Anomaly Detection Models
Adversarial Attacks on ML-based NIDSs
White-box Attacks
Query-based Black-box Attacks
Non-query-based Black-box Attacks
Transfer-based Attacks
Reasons for the Existence of the Adversarial Examples and Adversarial Transferability
Overview
Target Model
Threat Model
...and 46 more sections

Figures (7)

Figure 1: Overview of system design. The left of (a) overviews ETA framework, including four core steps: 1) Optimizing the substitute model (Opt-Sub), 2) Important-Sensitive Feature Selection (ISFS), 3) Gradient evaluation based on zeroth-order optimization (Grad-Eval), 4) Crafting AEs in domain constraints. (Crafting-AEs). The right of (a) present the target model, which consists of a traffic capturer, feature extractor, and ML-based classifier. In (b) we illustrate the threat model, where the attacker does not know the target model’s architecture and cannot freely interact with the target models.
Figure 2: Adversarial attack effectiveness of our attacks compared with baselines on different datasets (higher is better).
Figure 3: Cross-model transferability matrix: cell $(i, j)$ represents the success rate of attacks on classifier $j$ by the adversarial examples (AEs) generated for classifier $i$. Specifically, the rows indicate the substitute models that craft AEs, and the columns indicate the models under transfer-based attacks.
Figure 4: The impact of Important-Sensitive Feature Selection (ISFS) on the results of the ETA system.
Figure 5: With removing, using, and comparing non-robust features, the experiment explains the reasons for the existence of adversarial examples (AEs) and adversarial transferability in ML-based NIDSs
...and 2 more figures

Theorems & Definitions (5)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5

Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors

TL;DR

Abstract

Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (5)