Table of Contents
Fetching ...

Bounded and Uniform Energy-based Out-of-distribution Detection for Graphs

Shenzhi Yang, Bin Liang, An Liu, Lin Gui, Xingkai Yao, Xiaofang Zhang

TL;DR

This paper tackles node-level OOD detection in graphs, where GNNSAFE’s energy-based score aggregation can become unreliable due to extreme, unbounded negative energies and logit shifts. The authors introduce NODESAFE, adding two regularizers, $\mathcal{L}_{bound}$ and $\mathcal{L}_{uniform}$, to bound the logit $\ell_2$ norm and reduce logit-sum variance, respectively, forming $\mathcal{L}_{UB}$ that augments the existing energy-based objective $\mathcal{L}_{OOD}$ into $\mathcal{L}_{ALL} = \mathcal{L}_{OOD} + \lambda_2 \mathcal{L}_{UB}$. The method yields strong improvements in node-level OOD detection across multiple datasets and OOD generation scenarios, with substantial reductions in FPR95 and competitive AUROC/AUPR metrics, while maintaining efficient training and faster convergence. The work advances reliable GNN-based OOD detection for security-sensitive graph applications by addressing both the upper and lower bounds of the logit-derived energies. Overall, NODESAFE provides a principled, scalable approach to stabilize energy-based OOD detection on graphs and broadens its practical impact in real-world graph domains.

Abstract

Given the critical role of graphs in real-world applications and their high-security requirements, improving the ability of graph neural networks (GNNs) to detect out-of-distribution (OOD) data is an urgent research problem. The recent work GNNSAFE proposes a framework based on the aggregation of negative energy scores that significantly improves the performance of GNNs to detect node-level OOD data. However, our study finds that score aggregation among nodes is susceptible to extreme values due to the unboundedness of the negative energy scores and logit shifts, which severely limits the accuracy of GNNs in detecting node-level OOD data. In this paper, we propose NODESAFE: reducing the generation of extreme scores of nodes by adding two optimization terms that make the negative energy scores bounded and mitigate the logit shift. Experimental results show that our approach dramatically improves the ability of GNNs to detect OOD data at the node level, e.g., in detecting OOD data induced by Structure Manipulation, the metric of FPR95 (lower is better) in scenarios without (with) OOD data exposure are reduced from the current SOTA by 28.4% (22.7%).

Bounded and Uniform Energy-based Out-of-distribution Detection for Graphs

TL;DR

This paper tackles node-level OOD detection in graphs, where GNNSAFE’s energy-based score aggregation can become unreliable due to extreme, unbounded negative energies and logit shifts. The authors introduce NODESAFE, adding two regularizers, and , to bound the logit norm and reduce logit-sum variance, respectively, forming that augments the existing energy-based objective into . The method yields strong improvements in node-level OOD detection across multiple datasets and OOD generation scenarios, with substantial reductions in FPR95 and competitive AUROC/AUPR metrics, while maintaining efficient training and faster convergence. The work advances reliable GNN-based OOD detection for security-sensitive graph applications by addressing both the upper and lower bounds of the logit-derived energies. Overall, NODESAFE provides a principled, scalable approach to stabilize energy-based OOD detection on graphs and broadens its practical impact in real-world graph domains.

Abstract

Given the critical role of graphs in real-world applications and their high-security requirements, improving the ability of graph neural networks (GNNs) to detect out-of-distribution (OOD) data is an urgent research problem. The recent work GNNSAFE proposes a framework based on the aggregation of negative energy scores that significantly improves the performance of GNNs to detect node-level OOD data. However, our study finds that score aggregation among nodes is susceptible to extreme values due to the unboundedness of the negative energy scores and logit shifts, which severely limits the accuracy of GNNs in detecting node-level OOD data. In this paper, we propose NODESAFE: reducing the generation of extreme scores of nodes by adding two optimization terms that make the negative energy scores bounded and mitigate the logit shift. Experimental results show that our approach dramatically improves the ability of GNNs to detect OOD data at the node level, e.g., in detecting OOD data induced by Structure Manipulation, the metric of FPR95 (lower is better) in scenarios without (with) OOD data exposure are reduced from the current SOTA by 28.4% (22.7%).

Paper Structure

This paper contains 41 sections, 3 theorems, 29 equations, 6 figures, 10 tables.

Key Result

Proposition 3.1

Logit Shift: Consider the softmax function denoted by $\sigma$ for the softmax cross-entropy loss. Given any constant $s$, if $\mathrm{c=argmax_i(z_i)}$, then the identity $\sigma_{\mathrm{c}}(\mathbf{z} + s) = \sigma_{\mathrm{c}}(\mathbf{z})$ always holds.

Figures (6)

  • Figure 1: Visualization of negative energy scores of different nodes. (a) OOD data generated by Structure Manipulation wu2023energy, dark colors indicate higher scores, and arrows between nodes indicate the direction of score aggregation. (b) ID dataset for Cora, dark colors indicate lower scores. (c) Score frequency density plots show our motivation: lowering the variance of scores can better categorize ID and OOD data.
  • Figure 2: Visualizing 2D logits of nodes using GAT for node classification on the Cora dataset. The 2D logits are derived by splitting a single classification linear layer, $\mathbb{R}^D \rightarrow \mathbb{R}^C$, into two consecutive linear layers, $\mathbb{R}^D \rightarrow \mathbb{R}^2 \rightarrow \mathbb{R}^C$. Here, $D$ represents the node representation dimension, $C$ is the number of classification categories, and we visualize the logit in $\mathbb{R}^2$ space.
  • Figure 3: The Negative Energy Score distributions on Twitch where nodes in different sub-graphs are OOD samples.
  • Figure 4: Visualization of negative energy scores for different nodes. We compare the distribution of scores of GNNSAFE and NODESAFE on ID and OOD nodes after score aggregation on the Cora dataset.
  • Figure 5: Hyperparameter analysis on the Twitch dataset.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • proof
  • proof
  • proof