Table of Contents
Fetching ...

Cluster Aware Graph Anomaly Detection

Lecheng Zheng, John R. Birge, Haiyue Wu, Yifang Zhang, Jingrui He

TL;DR

This work tackles graph anomaly detection in multi-view, unlabeled graphs by introducing CARE, which augments the graph with soft cluster memberships to encode global affinities across views and employs a similarity-guided graph contrastive regularization to mitigate biases from pseudo-labels. The method combines a cluster-aware node affinity loss with a contrastive regularizer, forming an objective $J = -\sum_i \mathcal{L}_A(u_i) + \lambda \mathcal{L}_C$ and yielding anomaly scores $score_i = -\mathcal{L}_A(u_i)$, with a theoretical connection to graph spectral clustering. The authors provide a thorough theoretical justification and demonstrate state-of-the-art performance on six datasets (three multi-view and three single-view), including Amazon and YelpChi, while offering efficiency analyses and extensive ablations. This approach advances anomaly detection in heterogeneous graph data by leveraging both local and global affinities and by addressing pseudo-label biases, making it practical for real-world, large-scale, multi-view graphs.

Abstract

Graph anomaly detection has gained significant attention across various domains, particularly in critical applications like fraud detection in e-commerce platforms and insider threat detection in cybersecurity. Usually, these data are composed of multiple types (e.g., user information and transaction records for financial data), thus exhibiting view heterogeneity. However, in the era of big data, the heterogeneity of views and the lack of label information pose substantial challenges to traditional approaches. Existing unsupervised graph anomaly detection methods often struggle with high-dimensionality issues, rely on strong assumptions about graph structures or fail to handle complex multi-view graphs. To address these challenges, we propose a cluster aware multi-view graph anomaly detection method, called CARE. Our approach captures both local and global node affinities by augmenting the graph's adjacency matrix with the pseudo-label (i.e., soft membership assignments) without any strong assumption about the graph. To mitigate potential biases from the pseudo-label, we introduce a similarity-guided loss. Theoretically, we show that the proposed similarity-guided loss is a variant of contrastive learning loss, and we present how this loss alleviates the bias introduced by pseudo-label with the connection to graph spectral clustering. Experimental results on several datasets demonstrate the effectiveness and efficiency of our proposed framework. Specifically, CARE outperforms the second-best competitors by more than 39% on the Amazon dataset with respect to AUPRC and 18.7% on the YelpChi dataset with respect to AUROC. The code of our method is available at the GitHub link: https://github.com/zhenglecheng/CARE-demo.

Cluster Aware Graph Anomaly Detection

TL;DR

This work tackles graph anomaly detection in multi-view, unlabeled graphs by introducing CARE, which augments the graph with soft cluster memberships to encode global affinities across views and employs a similarity-guided graph contrastive regularization to mitigate biases from pseudo-labels. The method combines a cluster-aware node affinity loss with a contrastive regularizer, forming an objective and yielding anomaly scores , with a theoretical connection to graph spectral clustering. The authors provide a thorough theoretical justification and demonstrate state-of-the-art performance on six datasets (three multi-view and three single-view), including Amazon and YelpChi, while offering efficiency analyses and extensive ablations. This approach advances anomaly detection in heterogeneous graph data by leveraging both local and global affinities and by addressing pseudo-label biases, making it practical for real-world, large-scale, multi-view graphs.

Abstract

Graph anomaly detection has gained significant attention across various domains, particularly in critical applications like fraud detection in e-commerce platforms and insider threat detection in cybersecurity. Usually, these data are composed of multiple types (e.g., user information and transaction records for financial data), thus exhibiting view heterogeneity. However, in the era of big data, the heterogeneity of views and the lack of label information pose substantial challenges to traditional approaches. Existing unsupervised graph anomaly detection methods often struggle with high-dimensionality issues, rely on strong assumptions about graph structures or fail to handle complex multi-view graphs. To address these challenges, we propose a cluster aware multi-view graph anomaly detection method, called CARE. Our approach captures both local and global node affinities by augmenting the graph's adjacency matrix with the pseudo-label (i.e., soft membership assignments) without any strong assumption about the graph. To mitigate potential biases from the pseudo-label, we introduce a similarity-guided loss. Theoretically, we show that the proposed similarity-guided loss is a variant of contrastive learning loss, and we present how this loss alleviates the bias introduced by pseudo-label with the connection to graph spectral clustering. Experimental results on several datasets demonstrate the effectiveness and efficiency of our proposed framework. Specifically, CARE outperforms the second-best competitors by more than 39% on the Amazon dataset with respect to AUPRC and 18.7% on the YelpChi dataset with respect to AUROC. The code of our method is available at the GitHub link: https://github.com/zhenglecheng/CARE-demo.
Paper Structure (28 sections, 3 theorems, 12 equations, 4 figures, 5 tables)

This paper contains 28 sections, 3 theorems, 12 equations, 4 figures, 5 tables.

Key Result

Lemma 3.1

(Similarity-guided Graph Contrastive Loss) Let $\bm{\bar{M}}$ be the output of a one-layer graph neural network defined in Eq. graph_pooling. Then, we have where $\mathcal{L}_f = -\sum_{i=1}^n\sum_{j=1}^n \log\frac{\exp(2\bm{\tilde{A}}_{ij}\bm{\bar{h}}_i\bm{\bar{h}}_j^T)}{\Pi_{k=1}^n \exp((\bm{\bar{h}}_i\bm{\bar{h}}_k^T)^2)^{1/n}}$ is a graph contrastive loss and $C$ is a constant.

Figures (4)

  • Figure 1: The overview of CARE. It first extracts the global node affinity based on the soft assignment by graph clustering method, and then combines the global node affinity and local node affinity together. Similarity-guided graph contrastive loss is then introduced to mitigate the potential bias.
  • Figure 2: $\alpha$, $\log(\lambda)$ v.s. AUROC on four datasets.
  • Figure 3: The number of clusters v.s. AUROC on four datasets.
  • Figure 4: Efficiency analysis on the YelpChi dataset

Theorems & Definitions (4)

  • Lemma 3.1
  • Lemma 3.2
  • Definition 3.3
  • Theorem 3.4