Table of Contents
Fetching ...

Unsupervised Graph Outlier Detection: Problem Revisit, New Insight, and Superior Method

Yihong Huang, Liping Wang, Fan Zhang, Xuemin Lin

TL;DR

This work revisits unsupervised node outlier detection on attributed networks and uncovers a serious data leakage issue in standard outlier-injection benchmarks. It introduces VGOD, a two-branch framework that separately optimizes a Variance-Based Model for structural outliers and an Attribute Reconstruction Model for contextual outliers, with separate training and mean-std score normalization to achieve balanced detection. Empirical results across five real-world datasets show VGOD achieves state-of-the-art AUC and a favorable AucGap, demonstrating robustness to injection settings and outperforming baselines like Dominant, AnomalyDAE, DONE, CoLA, and CONAD. The paper also provides practical guidance for injection design to avoid leakage and highlights neighbor variance as a promising direction for graph anomaly detection in UNOD.

Abstract

A large number of studies on Graph Outlier Detection (GOD) have emerged in recent years due to its wide applications, in which Unsupervised Node Outlier Detection (UNOD) on attributed networks is an important area. UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier. Most existing works conduct experiments based on datasets with injected outliers. However, we find that the most widely-used outlier injection approach has a serious data leakage issue. By only utilizing such data leakage, a simple approach can achieve state-of-the-art performance in detecting outliers. In addition, we observe that existing algorithms have a performance drop with the mitigated data leakage issue. The other major issue is on balanced detection performance between the two types of outliers, which has not been considered by existing studies. In this paper, we analyze the cause of the data leakage issue in depth since the injection approach is a building block to advance UNOD. Moreover, we devise a novel variance-based model to detect structural outliers, which outperforms existing algorithms significantly and is more robust at kinds of injection settings. On top of this, we propose a new framework, Variance based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way. Finally, we conduct extensive experiments to demonstrate the effectiveness and efficiency of VGOD. The results on 5 real-world datasets validate that VGOD achieves not only the best performance in detecting outliers but also a balanced detection performance between structural and contextual outliers.

Unsupervised Graph Outlier Detection: Problem Revisit, New Insight, and Superior Method

TL;DR

This work revisits unsupervised node outlier detection on attributed networks and uncovers a serious data leakage issue in standard outlier-injection benchmarks. It introduces VGOD, a two-branch framework that separately optimizes a Variance-Based Model for structural outliers and an Attribute Reconstruction Model for contextual outliers, with separate training and mean-std score normalization to achieve balanced detection. Empirical results across five real-world datasets show VGOD achieves state-of-the-art AUC and a favorable AucGap, demonstrating robustness to injection settings and outperforming baselines like Dominant, AnomalyDAE, DONE, CoLA, and CONAD. The paper also provides practical guidance for injection design to avoid leakage and highlights neighbor variance as a promising direction for graph anomaly detection in UNOD.

Abstract

A large number of studies on Graph Outlier Detection (GOD) have emerged in recent years due to its wide applications, in which Unsupervised Node Outlier Detection (UNOD) on attributed networks is an important area. UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier. Most existing works conduct experiments based on datasets with injected outliers. However, we find that the most widely-used outlier injection approach has a serious data leakage issue. By only utilizing such data leakage, a simple approach can achieve state-of-the-art performance in detecting outliers. In addition, we observe that existing algorithms have a performance drop with the mitigated data leakage issue. The other major issue is on balanced detection performance between the two types of outliers, which has not been considered by existing studies. In this paper, we analyze the cause of the data leakage issue in depth since the injection approach is a building block to advance UNOD. Moreover, we devise a novel variance-based model to detect structural outliers, which outperforms existing algorithms significantly and is more robust at kinds of injection settings. On top of this, we propose a new framework, Variance based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way. Finally, we conduct extensive experiments to demonstrate the effectiveness and efficiency of VGOD. The results on 5 real-world datasets validate that VGOD achieves not only the best performance in detecting outliers but also a balanced detection performance between structural and contextual outliers.
Paper Structure (48 sections, 1 theorem, 30 equations, 9 figures, 16 tables, 1 algorithm)

This paper contains 48 sections, 1 theorem, 30 equations, 9 figures, 16 tables, 1 algorithm.

Key Result

Theorem 1

$P_r(\lVert \bm{x_{ci}} - \bm{x_i} \rVert_2 > \lVert \bm{x_{cj}} - \bm{x_i} \rVert_2 \Rightarrow \lVert \bm{x_{ci}} \rVert_2 > \lVert \bm{x_{cj}} \rVert_2 )>0.5$

Figures (9)

  • Figure 1: An example of structural and contextual outliers in UNOD.
  • Figure 2: After injecting outliers in four datasets, node degree is employed to detect structural outliers while L2-norm is employed to detect contextual outliers. Both of them, compared to the random detector, achieve unexpectedly high scores.
  • Figure 3: AUC of L2-norm for contextual outliers injection with varying parameter of $k$ (size of candidate set) and different distance measurements.
  • Figure 4: The overview of our proposed unsupervised node-level graph outlier detection framework VGOD. For a given attributed network $\mathcal{G}$, the variance-based model and attribute reconstruction model are employed to calculate the structural and contextual outlier score, respectively. The final score is the sum of two standardized scores. In the variance-based model (VBM), we use a negative edge sampling technique to generate a corresponding negative edge set $\mathcal{E}^{(-)}$ per epoch which has the same number of edges as $\mathcal{E}$. VBM is trained by the contrastive learning of $\mathcal{E}$ and $\mathcal{E}^{(-)}$.
  • Figure 5: (a) The MeanConv Layer. (b) The MinusConv Layer. We calculate the variance of neighbor nodes’ low-dimension latent representation by (a) and (b).
  • ...and 4 more figures

Theorems & Definitions (6)

  • Definition 1: Attributed Network
  • Definition 2: Outlier Detection on Attributed networks
  • Theorem 1
  • proof
  • Definition 3: negative edge set
  • Definition 4: negative network