Table of Contents
Fetching ...

EG-ConMix: An Intrusion Detection Method based on Graph Contrastive Learning

Lijin Wu, Shanshan Lei, Feilong Liao, Yuanjun Zheng, Yuxin Liu, Wentao Fu, Hao Song, Jiajun Zhou

TL;DR

An EG-ConMix method based on E-GraphSAGE, incorporating a data augmentation module to fix the problem of data imbalance is proposed, which exhibits significant advantages in terms of training speed and accuracy for large-scale graphs.

Abstract

As the number of IoT devices increases, security concerns become more prominent. The impact of threats can be minimized by deploying Network Intrusion Detection System (NIDS) by monitoring network traffic, detecting and discovering intrusions, and issuing security alerts promptly. Most intrusion detection research in recent years has been directed towards the pair of traffic itself without considering the interrelationships among them, thus limiting the monitoring of complex IoT network attack events. Besides, anomalous traffic in real networks accounts for only a small fraction, which leads to a severe imbalance problem in the dataset that makes algorithmic learning and prediction extremely difficult. In this paper, we propose an EG-ConMix method based on E-GraphSAGE, incorporating a data augmentation module to fix the problem of data imbalance. In addition, we incorporate contrastive learning to discern the difference between normal and malicious traffic samples, facilitating the extraction of key features. Extensive experiments on two publicly available datasets demonstrate the superior intrusion detection performance of EG-ConMix compared to state-of-the-art methods. Remarkably, it exhibits significant advantages in terms of training speed and accuracy for large-scale graphs.

EG-ConMix: An Intrusion Detection Method based on Graph Contrastive Learning

TL;DR

An EG-ConMix method based on E-GraphSAGE, incorporating a data augmentation module to fix the problem of data imbalance is proposed, which exhibits significant advantages in terms of training speed and accuracy for large-scale graphs.

Abstract

As the number of IoT devices increases, security concerns become more prominent. The impact of threats can be minimized by deploying Network Intrusion Detection System (NIDS) by monitoring network traffic, detecting and discovering intrusions, and issuing security alerts promptly. Most intrusion detection research in recent years has been directed towards the pair of traffic itself without considering the interrelationships among them, thus limiting the monitoring of complex IoT network attack events. Besides, anomalous traffic in real networks accounts for only a small fraction, which leads to a severe imbalance problem in the dataset that makes algorithmic learning and prediction extremely difficult. In this paper, we propose an EG-ConMix method based on E-GraphSAGE, incorporating a data augmentation module to fix the problem of data imbalance. In addition, we incorporate contrastive learning to discern the difference between normal and malicious traffic samples, facilitating the extraction of key features. Extensive experiments on two publicly available datasets demonstrate the superior intrusion detection performance of EG-ConMix compared to state-of-the-art methods. Remarkably, it exhibits significant advantages in terms of training speed and accuracy for large-scale graphs.
Paper Structure (20 sections, 7 equations, 4 figures, 3 tables)

This paper contains 20 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: By mapping the source and destination IP addresses in the raw data to a defined range. The IP address and port number are bound as a 2-tuple to represent the node, and the traffic transmitted between the two devices is represented as an edge. In this way, the network traffic is transformed into a graph structure representation.
  • Figure 2: The architecture of EG-ConMix. The complete workflow proceeds as follows: 1) For the data imbalance problem, MP-Mixup is used to generate multiple pairs of virtual nodes and corresponding connected edges; 2) Minimize the distance of positive sample pairs and maximize the distance of negative sample pairs by comparing the similarity of positive and negative samples; 3) The results of contrastive learning are combined with the E-GraphSAGE model to complete the classification of edges for the purpose of intrusion detection.
  • Figure 3: Statistics of macro-f1 score results for data partitioning experiments based on multiple methods.
  • Figure 4: Analysis of the number of negative samples corresponding to positive samples of Mixup data augmentation.