Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants

Gangda Deng; Hongkuan Zhou; Rajgopal Kannan; Viktor Prasanna

Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants

Gangda Deng, Hongkuan Zhou, Rajgopal Kannan, Viktor Prasanna

TL;DR

This work systematically demonstrates a shift in GNN generalization preferences across nodes with different homophily levels as depth increases, and proposes to improve deeper GNN generalization while maintaining high expressivity by Mixture of scope experts at test (Moscat).

Abstract

Heterophilous graphs, where dissimilar nodes tend to connect, pose a challenge for graph neural networks (GNNs). Increasing the GNN depth can expand the scope (i.e., receptive field), potentially finding homophily from the higher-order neighborhoods. However, GNNs suffer from performance degradation as depth increases. Despite having better expressivity, state-of-the-art deeper GNNs achieve only marginal improvements compared to their shallow variants. Through theoretical and empirical analysis, we systematically demonstrate a shift in GNN generalization preferences across nodes with different homophily levels as depth increases. This creates a disparity in generalization patterns between GNN models with varying depth. Based on these findings, we propose to improve deeper GNN generalization while maintaining high expressivity by Mixture of scope experts at test (Moscat). Experimental results show that Moscat works flexibly with various GNNs across a wide range of datasets while significantly improving accuracy. Our code is available at (https://github.com/Hydrapse/moscat).

Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants

TL;DR

Abstract

Paper Structure (47 sections, 2 theorems, 27 equations, 19 figures, 16 tables)

This paper contains 47 sections, 2 theorems, 27 equations, 19 figures, 16 tables.

Introduction
Preliminaries
Understanding Generalization Disparity across GNN Scope Experts
Unpacking the Depth Dilemma: Why Do GNNs Struggle with Generalization?
Subgroup Generalization Bound for GNNs with Varying Scopes: A Data-Centric Perspective
Performance Disparity across Scope Experts on Real-World Datasets
Proposed Method: GNN-Moscat
The MoE Workflow
Discussion and Analysis
Experiments
Experimental setup
Performance comparison
Ablation study
Case Study: how Moscat become effective?
Conclusion
...and 32 more sections

Key Result

Theorem 3.3

Assume the aggregated features $g^L( {\mathbf{X}} , {\mathcal{G}} )$ share the same variance $\sigma^2\mathbf{I}$. Let $\theta$ be any classifier in the parameter set $\{ {\mathbf{W}} ^{(l)}\}_{l=1}^{L'}$ and $S$ denote the training set. For any test subgroup $m \in \{1, \cdots where $\|W\|^2:=\sum_{l=1}^{L^{\prime}}\|\widetilde{W_l}\|_F^2$, $\rho:=\left\|\boldsymbol{\mu}_1-\

Figures (19)

Figure 1: Performance on amazon-ratings. (Left) Deeper GNNs exhibit performance disparities across node subgroups with different homophily ratios, with the shaded area indicating the node distribution. (Right) Deeper GNNs and their shallow variants show a shift in generalization preference across homophily ratios, with the red dotted line indicating the average training set homophily (0.38).
Figure 2: The landscape of GNNs with scope mixing.
Figure 3: (Top) The overlapping ratio on Penn94. (Bottom) Test accuracy under Oracle ensemble. Multi-Scope represents the ensemble of GNNs with depths ranging from $L=0$ (MLP) to $n$ ($n \leq 6$). Fixed-Scope represents the ensemble of GNNs with identical depth $L=L_{\text{best}}$. The horizontal red dotted line shows the SOTA GNN accuracy.
Figure 4: Overview of Moscat. (1) Different-depth GNN models serve as scope experts (with MLP as 0-hop), each trained independently. (2) Collect logits from each expert by running inference on the expert-training set ${\mathcal{V}} _{\text{exp}}$ and a holdout set ${\mathcal{V}} _{\text{hold}}$, then perform heterophily-biased filtering to form the gating-training set ${\mathcal{V}} _{\text{gate}}$. (3) Enhance logits with label embeddings and structural encoding. (4) Train the gating model using node labels, with learnable parameters shown in yellow blocks.
Figure 5: Moscat outperforms classic GNNs (e.g., SGC, GCN) and soft-scoping GNNs (e.g., GAMLP, ACMGCN). GAMLP and ACMGCN adaptively learn the scope of SGC and GCN, respectively.
...and 14 more figures

Theorems & Definitions (5)

Theorem 3.3: GNN Subgroup generalization bound
Lemma B.1
proof
Definition C.1: Size of scope
Definition C.2: Model with personalized scoping

Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants

TL;DR

Abstract

Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (5)