Table of Contents
Fetching ...

Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks

Jing Gu, Dongmian Zou

TL;DR

This study revisits datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects, introducing outlier injection methods that create more diverse and graph-based anomalies in graph datasets and comparing methods employing message passing against those without.

Abstract

Graph anomaly detection plays a vital role for identifying abnormal instances in complex networks. Despite advancements of methodology based on deep learning in recent years, existing benchmarking approaches exhibit limitations that hinder a comprehensive comparison. In this paper, we revisit datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects. Firstly, we introduce outlier injection methods that create more diverse and graph-based anomalies in graph datasets. Secondly, we compare methods employing message passing against those without, uncovering the unexpected decline in performance associated with message passing. Thirdly, we explore the use of hyperbolic neural networks, specifying crucial architecture and loss design that contribute to enhanced performance. Through rigorous experiments and evaluations, our study sheds light on general strategies for improving node-level graph anomaly detection methods.

Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks

TL;DR

This study revisits datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects, introducing outlier injection methods that create more diverse and graph-based anomalies in graph datasets and comparing methods employing message passing against those without.

Abstract

Graph anomaly detection plays a vital role for identifying abnormal instances in complex networks. Despite advancements of methodology based on deep learning in recent years, existing benchmarking approaches exhibit limitations that hinder a comprehensive comparison. In this paper, we revisit datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects. Firstly, we introduce outlier injection methods that create more diverse and graph-based anomalies in graph datasets. Secondly, we compare methods employing message passing against those without, uncovering the unexpected decline in performance associated with message passing. Thirdly, we explore the use of hyperbolic neural networks, specifying crucial architecture and loss design that contribute to enhanced performance. Through rigorous experiments and evaluations, our study sheds light on general strategies for improving node-level graph anomaly detection methods.
Paper Structure (34 sections, 2 theorems, 39 equations, 3 figures, 39 tables)

This paper contains 34 sections, 2 theorems, 39 equations, 3 figures, 39 tables.

Key Result

Lemma 4.1

Suppose a graph ${\mathcal{G}}$ contains $n_{\mathcal{V}}$ nodes, among which $n_\textup{normal}$ nodes are normal ($n_\textup{normal} > n_{\mathcal{V}} / 2$), each with the same unit-norm feature ${\mathbf{x}}_\textup{normal}$; and the remaining $n_{\mathcal{V}} - n_\text{normal}$ nodes are outlier

Figures (3)

  • Figure 1: Comparison of model's mean ROC-AUC (%) in detecting contextual outliers injected in Cora, Squirrel, and Amazon datasets with and without $l_2$ normalization.
  • Figure 2: Probability density function for pairwise distances between embeddings of nodes that are originally connected ($\{ d({\bf h}_i,{\bf h}_j) \}_{(i,j) \in \mathcal{E}}$) and disconnected ($\{ d({\bf h}_i,{\bf h}_j) \}_{(i,j) \notin \mathcal{E}}$) of the Euclidean, Lorentz, and Poincaré models in Cora dataset injected with cntxt.+strct. and "path"+DICE outliers.
  • Figure 3: Probability density function (PDF) of the distribution of pairwise distances between embeddings of nodes that are originally connected ($\{ d({\bf h}_i,{\bf h}_j) \}_{(i,j) \in \mathcal{E}}$) and disconnected ($\{ d({\bf h}_i,{\bf h}_j) \}_{(i,j) \notin \mathcal{E}}$) of the Euclidean, Lorentz, and Poincaré models in Squirrel dataset injected with cntxt.+strct. and "path"+DICE outliers.

Theorems & Definitions (4)

  • Lemma 4.1
  • Lemma 4.2
  • proof
  • proof