Table of Contents
Fetching ...

PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

Ziqi Yuan, Haoyi Zhou, Tianyu Chen, Jianxin Li

TL;DR

PhoGAD introduces a graph-based anomaly detection framework that tackles blurred behavioral boundaries and local heterophily by combining persistent homology optimization with explicit edge embedding. It refines edge attributes via Vietoris-Rips topological analysis and employs a disentangled representation with adjacency-based edge weights to mitigate noise in edge-based detection. The approach demonstrates state-of-the-art performance across intrusion, TOR, and spam datasets and remains robust under extremely low anomaly proportions. This topological-edge paradigm offers a scalable, domain-agnostic method for reliable network anomaly detection with strong generalization potential.

Abstract

A multitude of toxic online behaviors, ranging from network attacks to anonymous traffic and spam, have severely disrupted the smooth operation of networks. Due to the inherent sender-receiver nature of network behaviors, graph-based frameworks are commonly used for detecting anomalous behaviors. However, in real-world scenarios, the boundary between normal and anomalous behaviors tends to be ambiguous. The local heterophily of graphs interferes with the detection, and existing methods based on nodes or edges introduce unwanted noise into representation results, thereby impacting the effectiveness of detection. To address these issues, we propose PhoGAD, a graph-based anomaly detection framework. PhoGAD leverages persistent homology optimization to clarify behavioral boundaries. Building upon this, the weights of adjacent edges are designed to mitigate the effects of local heterophily. Subsequently, to tackle the noise problem, we conduct a formal analysis and propose a disentangled representation-based explicit embedding method, ultimately achieving anomaly behavior detection. Experiments on intrusion, traffic, and spam datasets verify that PhoGAD has surpassed the performance of state-of-the-art (SOTA) frameworks in detection efficacy. Notably, PhoGAD demonstrates robust detection even with diminished anomaly proportions, highlighting its applicability to real-world scenarios. The analysis of persistent homology demonstrates its effectiveness in capturing the topological structure formed by normal edge features. Additionally, ablation experiments validate the effectiveness of the innovative mechanisms integrated within PhoGAD.

PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

TL;DR

PhoGAD introduces a graph-based anomaly detection framework that tackles blurred behavioral boundaries and local heterophily by combining persistent homology optimization with explicit edge embedding. It refines edge attributes via Vietoris-Rips topological analysis and employs a disentangled representation with adjacency-based edge weights to mitigate noise in edge-based detection. The approach demonstrates state-of-the-art performance across intrusion, TOR, and spam datasets and remains robust under extremely low anomaly proportions. This topological-edge paradigm offers a scalable, domain-agnostic method for reliable network anomaly detection with strong generalization potential.

Abstract

A multitude of toxic online behaviors, ranging from network attacks to anonymous traffic and spam, have severely disrupted the smooth operation of networks. Due to the inherent sender-receiver nature of network behaviors, graph-based frameworks are commonly used for detecting anomalous behaviors. However, in real-world scenarios, the boundary between normal and anomalous behaviors tends to be ambiguous. The local heterophily of graphs interferes with the detection, and existing methods based on nodes or edges introduce unwanted noise into representation results, thereby impacting the effectiveness of detection. To address these issues, we propose PhoGAD, a graph-based anomaly detection framework. PhoGAD leverages persistent homology optimization to clarify behavioral boundaries. Building upon this, the weights of adjacent edges are designed to mitigate the effects of local heterophily. Subsequently, to tackle the noise problem, we conduct a formal analysis and propose a disentangled representation-based explicit embedding method, ultimately achieving anomaly behavior detection. Experiments on intrusion, traffic, and spam datasets verify that PhoGAD has surpassed the performance of state-of-the-art (SOTA) frameworks in detection efficacy. Notably, PhoGAD demonstrates robust detection even with diminished anomaly proportions, highlighting its applicability to real-world scenarios. The analysis of persistent homology demonstrates its effectiveness in capturing the topological structure formed by normal edge features. Additionally, ablation experiments validate the effectiveness of the innovative mechanisms integrated within PhoGAD.
Paper Structure (28 sections, 12 equations, 4 figures, 5 tables)

This paper contains 28 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The noise problem presented by current detection methods. (a) demonstrates the noise problem arising from node-based edge representation, where blue and green represent nodes and their attributes directly and indirectly linked to the edge. (b) showcases the noise problem arising from edge convolution, where green and red signify normal and anomalous edges.
  • Figure 2: An illustrative figure of a graph structure. Solid lines represent nodes and edges of interest, while dashed lines represent irrelevant portions that have no bearing on the illustration. Disregarding the direction of edges will not hinder the analysis of the noise problems.
  • Figure 3: Overview of PhoGAD: (a) The graph constructed from behavioral data; (b) Optimization of edge attributes via persistent homology, where edges marked in blue with double lines correspond to enduring topological structures, and $r_\bullet$ represents the radius employed in persistent homology; (c) Explicit edge embedding with neighbor weights and disentangled representation, where nodes involved in weight calculation are shaded in blue, $weight_{e e_i}$ represents the weight between edge $e$ and $e_i$, and $h_e^{(k)}$ represents the embedding of edge $e$ after the $k$-th iteration; (d) Direct mapping of attributes to the output space after two layers of edge embedding; (e) Detection results, where edges highlighted in red are anomalous edges, indicating anomalous behaviors such as network intrusion or spam emails.
  • Figure 4: Results of persistent homology. (a), (c), and (e) are persistence diagrams, where the horizontal axis denotes the radius $\varepsilon$ at which the corresponding topological structure appears, while the vertical axis denotes $\varepsilon$ at which it vanishes. (b), (d), and (f) are barcode diagrams, with the left and right endpoints of each bar indicating $\varepsilon$ at which the structure appears and vanishes.