Table of Contents
Fetching ...

Resolving Node Identifiability in Graph Neural Processes via Laplacian Spectral Encodings

Zimo Yan, Zheng Xie, Chang Liu, Yuan Wang

TL;DR

This work identifies a fundamental expressiveness gap in graph neural processes: standard WL-bounded encoders cannot distinguish symmetric nodes, leading to high Bayes risk. By introducing a sign-/basis-invariant Laplacian spectral positional encoding and an anchor-based diffusion trilateration scheme, the authors prove a sample-complexity separation, achieving constant-shot identifiability on random $r$-regular graphs. They validate the theory with a drug-drug interaction prediction task, showing substantial improvements in AUROC and F1 when using Laplacian PEs, and demonstrate faster convergence in transductive settings. The results bridge expressive power limitations with principled positional information, offering scalable, robust enhancements for probabilistic graph models in real-world applications. The work also outlines future directions for scalable spectral methods and extensions to broader graph types and dynamic networks.

Abstract

Message passing graph neural networks are widely used for learning on graphs, yet their expressive power is limited by the one-dimensional Weisfeiler-Lehman test and can fail to distinguish structurally different nodes. We provide rigorous theory for a Laplacian positional encoding that is invariant to eigenvector sign flips and to basis rotations within eigenspaces. We prove that this encoding yields node identifiability from a constant number of observations and establishes a sample-complexity separation from architectures constrained by the Weisfeiler-Lehman test. The analysis combines a monotone link between shortest-path and diffusion distance, spectral trilateration with a constant set of anchors, and quantitative spectral injectivity with logarithmic embedding size. As an instantiation, pairing this encoding with a neural-process style decoder yields significant gains on a drug-drug interaction task on chemical graphs, improving both the area under the ROC curve and the F1 score and demonstrating the practical benefits of resolving theoretical expressiveness limitations with principled positional information.

Resolving Node Identifiability in Graph Neural Processes via Laplacian Spectral Encodings

TL;DR

This work identifies a fundamental expressiveness gap in graph neural processes: standard WL-bounded encoders cannot distinguish symmetric nodes, leading to high Bayes risk. By introducing a sign-/basis-invariant Laplacian spectral positional encoding and an anchor-based diffusion trilateration scheme, the authors prove a sample-complexity separation, achieving constant-shot identifiability on random -regular graphs. They validate the theory with a drug-drug interaction prediction task, showing substantial improvements in AUROC and F1 when using Laplacian PEs, and demonstrate faster convergence in transductive settings. The results bridge expressive power limitations with principled positional information, offering scalable, robust enhancements for probabilistic graph models in real-world applications. The work also outlines future directions for scalable spectral methods and extensions to broader graph types and dynamic networks.

Abstract

Message passing graph neural networks are widely used for learning on graphs, yet their expressive power is limited by the one-dimensional Weisfeiler-Lehman test and can fail to distinguish structurally different nodes. We provide rigorous theory for a Laplacian positional encoding that is invariant to eigenvector sign flips and to basis rotations within eigenspaces. We prove that this encoding yields node identifiability from a constant number of observations and establishes a sample-complexity separation from architectures constrained by the Weisfeiler-Lehman test. The analysis combines a monotone link between shortest-path and diffusion distance, spectral trilateration with a constant set of anchors, and quantitative spectral injectivity with logarithmic embedding size. As an instantiation, pairing this encoding with a neural-process style decoder yields significant gains on a drug-drug interaction task on chemical graphs, improving both the area under the ROC curve and the F1 score and demonstrating the practical benefits of resolving theoretical expressiveness limitations with principled positional information.

Paper Structure

This paper contains 35 sections, 8 theorems, 89 equations, 5 figures, 3 tables.

Key Result

Theorem 1

Let $G\sim\mathcal{G}_{n,r}$ with fixed $r\ge 3$ and $v_0\sim\mathop{\mathrm{Unif}}\nolimits(V)$.

Figures (5)

  • Figure 1: Venn-style overview of our theory versus DE. The overlap shows shared assumptions/goals (log-depth window on random $r$-regular graphs; distances as structural signals; going beyond 1-WL), while the left/right lobes list theory unique to our anchor-based Laplacian spectral approach and to DE, respectively.
  • Figure 2: The Laplacian Positional Encoding (PE) pipeline. For a molecular graph with $N$ atoms, its adjacency matrix is used to compute a Laplacian matrix ($L$ or $L_u$). The first $k$ eigenvectors of this Laplacian are extracted to form an $N\times k$ PE matrix. Finally, each atom's original $d_{\text{node}}$-dimensional feature vector is concatenated with its corresponding $k$-dimensional PE vector, resulting in an augmented $N\times(d_{\text{node}}+k)$ feature matrix for encoder.
  • Figure 3: Models with Laplacian-based positional encodings outperform the baseline on validation/test AUROC and F1, with lower training loss over 50 epochs.
  • Figure 4: Transductive setting: all models reach similar final performance, but those with positional encodings converge substantially faster than the baseline.
  • Figure 5: Sample-complexity separation on random $r$-regular graphs. (a) Top-1 identification accuracy as a function of the number of context points $k$ for WLGNP (blue) and Lap-GNP (red) at $n\!\in\!\{2048,4096\}$. (b) The same curves after rescaling the $x$-axis by $\log_2 n$, showing collapse across $n$. (c) "Bucket" diagnostics for the WL-bounded setup: the empirical $\mathbb{E}[1/|{\rm bucket}|]$ closely matches the WLGNP accuracy, and the singleton probability $\Pr(|{\rm bucket}|=1)$ rises with $k$, supporting the risk relation in Eq. \ref{['eq:riskbucket']}. Shaded bands indicate variability across runs.

Theorems & Definitions (16)

  • Theorem 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Proposition 4
  • proof
  • Lemma 5
  • proof
  • Lemma 6: Spectral trilateration
  • ...and 6 more