Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability
Sabrine Ennaji, Elhadj Benkhelifa, Luigi Vincenzo Mancini
TL;DR
The paper investigates the realistic feasibility of transferability-based adversarial attacks against ML-based IDS in black-box contexts. It introduces the Transferability Feasibility Score (TFS), combining feature alignment, architecture similarity, and data distribution homogeneity as $TFS = \alpha f_{align} + \beta A_{sim} + \gamma D_{hom}$ to predict attack transferability. Through experiments on the CSE-CIC-IDS2018 dataset, it shows that even with strong feature overlap, architectural divergence can hinder transferability, yielding a moderate overall TFS (~0.52) and substantial degradation of target performance in certain classes (e.g., DDoS becoming unidentifiable). The findings emphasize a gap between theoretical transferability and practical risk, and they offer guidance for defenders to reduce transferability by increasing architectural diversity and data-domain robustness, as well as for researchers to evaluate attacks with the proposed feasibility metric. Overall, the work provides a concrete framework to assess and improve realism in transferable adversarial attacks against IDSs, with implications for designing more robust security solutions in real-world networks.
Abstract
Transferability-based adversarial attacks exploit the ability of adversarial examples, crafted to deceive a specific source Intrusion Detection System (IDS) model, to also mislead a target IDS model without requiring access to the training data or any internal model parameters. These attacks exploit common vulnerabilities in machine learning models to bypass security measures and compromise systems. Although the transferability concept has been widely studied, its practical feasibility remains limited due to assumptions of high similarity between source and target models. This paper analyzes the core factors that contribute to transferability, including feature alignment, model architectural similarity, and overlap in the data distributions that each IDS examines. We propose a novel metric, the Transferability Feasibility Score (TFS), to assess the feasibility and reliability of such attacks based on these factors. Through experimental evidence, we demonstrate that TFS and actual attack success rates are highly correlated, addressing the gap between theoretical understanding and real-world impact. Our findings provide needed guidance for designing more realistic transferable adversarial attacks, developing robust defenses, and ultimately improving the security of machine learning-based IDS in critical systems.
