Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors
Hangsheng Zhang, Dongqi Han, Yinlong Liu, Zhiliang Wang, Jiyan Sun, Shangyuan Zhuang, Jiqiang Liu, Jinsong Dong
TL;DR
This work tackles the vulnerability of ML-based NIDSs to adversarial examples under realistic black-box conditions. It introduces ETA, a generic transfer-based attack framework that leverages an ensemble substitute model and Important-Sensitive Feature Selection (ISFS) to craft adversarial traffic that remains valid under traffic-space constraints, while providing explanations via cooperative game theory and perturbation-based interpretation. The key contributions are: (i) a cross-model transferable attack architecture combining differentiable and non-differentiable substitutes, (ii) a zeroth-order gradient evaluation strategy for non-differentiable substitutes, (iii) ISFS that links non-robust features to adversarial existence and transferability, and (iv) extensive experiments showing ~70% average transfer success across diverse NIDSs, datasets, and constraints, plus real-environment validation and interpretive analyses. The findings underscore the security risks of feature-centric NIDS designs and the importance of focusing on robust, intrinsic traffic features to improve robustness, with broader implications for securing ML-enabled security systems.
Abstract
espite being widely used in network intrusion detection systems (NIDSs), machine learning (ML) has proven to be highly vulnerable to adversarial attacks. White-box and black-box adversarial attacks of NIDS have been explored in several studies. However, white-box attacks unrealistically assume that the attackers have full knowledge of the target NIDSs. Meanwhile, existing black-box attacks can not achieve high attack success rate due to the weak adversarial transferability between models (e.g., neural networks and tree models). Additionally, neither of them explains why adversarial examples exist and why they can transfer across models. To address these challenges, this paper introduces ETA, an Explainable Transfer-based Black-Box Adversarial Attack framework. ETA aims to achieve two primary objectives: 1) create transferable adversarial examples applicable to various ML models and 2) provide insights into the existence of adversarial examples and their transferability within NIDSs. Specifically, we first provide a general transfer-based adversarial attack method applicable across the entire ML space. Following that, we exploit a unique insight based on cooperative game theory and perturbation interpretations to explain adversarial examples and adversarial transferability. On this basis, we propose an Important-Sensitive Feature Selection (ISFS) method to guide the search for adversarial examples, achieving stronger transferability and ensuring traffic-space constraints.
