Table of Contents
Fetching ...

Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Rui Zhang, Dawei Cheng, Xin Liu, Jie Yang, Yi Ouyang, Xian Wu, Yefeng Zheng

TL;DR

A novel GNN model named HedGe, which improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks, and improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks.

Abstract

Graph-based anomaly detection is currently an important research topic in the field of graph neural networks (GNNs). We find that in graph anomaly detection, the homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. For the first time, we introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon. To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe). Previous works typically focused on pruning, selecting or connecting on original relationships, and we refer to these methods as modifications. Different from these works, our method emphasizes generating new relationships with low class homophily variance, using the original relationships as an auxiliary. HedGe samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that HedGe achieved the best performance across multiple benchmark datasets, including anomaly detection and edgeless node classification. The proposed model also improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks.

Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

TL;DR

A novel GNN model named HedGe, which improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks, and improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks.

Abstract

Graph-based anomaly detection is currently an important research topic in the field of graph neural networks (GNNs). We find that in graph anomaly detection, the homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. For the first time, we introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon. To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe). Previous works typically focused on pruning, selecting or connecting on original relationships, and we refer to these methods as modifications. Different from these works, our method emphasizes generating new relationships with low class homophily variance, using the original relationships as an auxiliary. HedGe samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that HedGe achieved the best performance across multiple benchmark datasets, including anomaly detection and edgeless node classification. The proposed model also improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks.
Paper Structure (45 sections, 3 theorems, 14 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 45 sections, 3 theorems, 14 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

For a graph $\mathcal{G} \sim \text{CSBM-C}(\boldsymbol{\mu}_0, \boldsymbol{\mu}_1, d, h_0, h_1)$, for any node $i$ in $\mathcal{G}$, the smaller the value of $|h_0+h_1-1|$, the greater the probability that $\mathbf{h}_i$ will be misclassified by $\mathbf{h}$'s optimal linear classifier.

Figures (7)

  • Figure 1: Weighted homophily density distribution on different datasets. We use curves to fit the distribution for clarity.
  • Figure 2: The overall architecture of the proposed HedGe. (a) We first apply position encoding to enhance node information. (b) Then we calculate node relationships and sample new relationships through self-attention. (c) Next, we aggregate multiple relationships. (d) Finally, we penalize the attention matrix to suppress the generation of heterophilic edges.
  • Figure 3: Accuracy for different models under Heterophily Attack to increase Class Homophily Variance.
  • Figure 4: Weighted homophily density distribution of the original and generated graphs. The pair of numbers enclosed by parentheses presents the Class Homophily Variance of the original and generated graphs respectively.
  • Figure 5: t-SNE visualization of learned embeddings.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: Class Homophily Variance
  • Theorem 1
  • Theorem 1
  • Theorem 2