Table of Contents
Fetching ...

CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks

Munan Li, Xianshi Su, Runze Ma, Tongbang Jiang, Zijian Li, Tony Q. S. Quek

TL;DR

The paper tackles imbalanced node anomaly detection in IoT networks by introducing CGGM, a conditional graph generation framework that synthesizes minority-class graph snapshots to balance data for downstream detection. CGGM combines an adaptive sparsity adjacency generator with a self-attention–based multi-dimensional feature encoder in a GAN setup, and enforces a latent-space constraint to better match real data distributions. A TDG-based data pipeline feeds a GNN-based anomaly detector, enabling both binary and multi-class detection improvements. Extensive experiments on UNSW-NB15 and CICIDS-2017 show CGGM achieves higher distributional similarity to real data and superior classification performance compared with baselines like CTGAN, TableGAN, GraphRNN, and GraphSGAN, highlighting its practical potential for robust IoT security in imbalanced settings.

Abstract

Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Graph generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a method for end-to-end generation of multiple categories of nodes. In this paper, we propose a novel graph generation model, called CGGM, specifically for generating samples belonging to the minority class. The framework consists two core module: a conditional graph generation module and a graph-based anomaly detection module. The generative module adapts to the sparsity of the matrix by downsampling a noise adjacency matrix, and incorporates a multi-dimensional feature encoder based on multi-head self-attention to capture latent dependencies among features. Additionally, a latent space constraint is combined with the distribution distance to approximate the latent distribution of real data. The graph-based anomaly detection module utilizes the generated balanced dataset to predict the node behaviors. Extensive experiments have shown that CGGM outperforms the state-of-the-art methods in terms of accuracy and divergence. The results also demonstrate CGGM can generated diverse data categories, that enhancing the performance of multi-category classification task.

CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks

TL;DR

The paper tackles imbalanced node anomaly detection in IoT networks by introducing CGGM, a conditional graph generation framework that synthesizes minority-class graph snapshots to balance data for downstream detection. CGGM combines an adaptive sparsity adjacency generator with a self-attention–based multi-dimensional feature encoder in a GAN setup, and enforces a latent-space constraint to better match real data distributions. A TDG-based data pipeline feeds a GNN-based anomaly detector, enabling both binary and multi-class detection improvements. Extensive experiments on UNSW-NB15 and CICIDS-2017 show CGGM achieves higher distributional similarity to real data and superior classification performance compared with baselines like CTGAN, TableGAN, GraphRNN, and GraphSGAN, highlighting its practical potential for robust IoT security in imbalanced settings.

Abstract

Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Graph generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a method for end-to-end generation of multiple categories of nodes. In this paper, we propose a novel graph generation model, called CGGM, specifically for generating samples belonging to the minority class. The framework consists two core module: a conditional graph generation module and a graph-based anomaly detection module. The generative module adapts to the sparsity of the matrix by downsampling a noise adjacency matrix, and incorporates a multi-dimensional feature encoder based on multi-head self-attention to capture latent dependencies among features. Additionally, a latent space constraint is combined with the distribution distance to approximate the latent distribution of real data. The graph-based anomaly detection module utilizes the generated balanced dataset to predict the node behaviors. Extensive experiments have shown that CGGM outperforms the state-of-the-art methods in terms of accuracy and divergence. The results also demonstrate CGGM can generated diverse data categories, that enhancing the performance of multi-category classification task.
Paper Structure (25 sections, 12 equations, 8 figures, 7 tables, 3 algorithms)

This paper contains 25 sections, 12 equations, 8 figures, 7 tables, 3 algorithms.

Figures (8)

  • Figure 1: The overall framework of the process. The framework is made up of three components: Graph Generation, Conditional Graph Generative Model and Anomaly Detection Model. Firstly, a sequence of graph snapshots is extracted from the traffic samples by Traffic Dispersion Graphs(TDG) construction method. The conditional graph generation model is then used to generate synthetic data that approximates the real data. Eventually, the synthetic data is aggregated with the real data as an input to the anomaly detection model. By capturing the spatial structure features of the nodes, the anomaly detection model can predict anomalies.
  • Figure 2: A detailed illustration of the CGGM. The model consists of a generator network $\mathcal{T}$ and a discriminator $\mathcal{D}$. $\boldmath{G_o}$ is the input random noise graph data, $\boldmath{G_r}$ is the real graph data, and $\boldmath{G_g}$ is the synthetic graph data generated by the model.
  • Figure 3: T-SNE visualisation of synthetic data features with different generation methods.
  • Figure 4: Label category balance result.
  • Figure 5: Multi-classification performance with balanced and imbalanced datasets.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6