Group & Reweight: A Novel Cost-Sensitive Approach to Mitigating Class Imbalance in Network Traffic Classification
Wumei Du, Dong Liang, Yiqin Lv, Xingxing Liang, Guanlin Wu, Qi Wang, Zheng Xie
TL;DR
This paper tackles severe class imbalance in network traffic classification by introducing Group & Reweight (GDR-CIL), a group distributionally robust, cost-sensitive learner. The method clusters classes into groups, assigns group-level weights, and optimizes a reweighted loss under a Stackelberg-game interpretation, linking distributional robustness to cost-sensitive learning. The approach is supported by theoretical analysis of a global/local Stackelberg equilibrium and convergence, and demonstrated to improve minority-class performance while preserving overall accuracy across CIC-IDS2017, NSL-KDD, and UNSW-NB15 datasets. Overall, GDR-CIL offers a scalable, principled strategy to mitigate boundary drift caused by imbalance in highly multi-class network traffic tasks, with practical impact for safer and more reliable intrusion detection systems.
Abstract
Internet services have led to the eruption of network traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of severe class imbalance. Such a distributional trait mostly drifts the optimal decision boundary and results in an unsatisfactory solution. This raises safety concerns in the network traffic field when previous class imbalance methods hardly deal with numerous minority malicious classes. To alleviate these effects, we design a group & reweight strategy for alleviating class imbalance. Inspired by the group distributionally optimization framework, our approach heuristically clusters classes into groups, iteratively updates the non-parametric weights for separate classes, and optimizes the learning model by minimizing reweighted losses. We theoretically interpret the optimization process from a Stackelberg game and perform extensive experiments on typical benchmarks. Results show that our approach can not only suppress the negative effect of class imbalance but also improve the comprehensive performance in prediction.
