Fully-inductive Node Classification on Arbitrary Graphs

Jianan Zhao; Zhaocheng Zhu; Mikhail Galkin; Hesham Mostafa; Michael Bronstein; Jian Tang

Fully-inductive Node Classification on Arbitrary Graphs

Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin, Hesham Mostafa, Michael Bronstein, Jian Tang

TL;DR

The paper tackles the challenge of fully-inductive node classification, where test graphs can have new structures and entirely new feature and label spaces. It introduces GraphAny, combining analytical LinearGNNs with a learnable inductive attention mechanism based on entropy-normalized distance features to generalize across arbitrary graphs. The approach yields strong cross-graph generalization, training only the attention module while performing inference with non-parametric LinearGNNs, achieving a 67.26% average accuracy on 30 unseen graphs and substantial speedups over per-dataset transductive baselines. This work lays a foundation for cross-domain graph generalization and could inform future graph foundation-model developments by emphasizing permutation-invariance and robust dimension generalization.

Abstract

One fundamental challenge in graph machine learning is generalizing to new graphs. Many existing methods following the inductive setup can generalize to test graphs with new structures, but assuming the feature and label spaces remain the same as the training ones. This paper introduces a fully-inductive setup, where models should perform inference on arbitrary test graphs with new structures, feature and label spaces. We propose GraphAny as the first attempt at this challenging setup. GraphAny models inference on a new graph as an analytical solution to a LinearGNN, which can be naturally applied to graphs with any feature and label spaces. To further build a stronger model with learning capacity, we fuse multiple LinearGNN predictions with learned inductive attention scores. Specifically, the attention module is carefully parameterized as a function of the entropy-normalized distance features between pairs of LinearGNN predictions to ensure generalization to new graphs. Empirically, GraphAny trained on a single Wisconsin dataset with only 120 labeled nodes can generalize to 30 new graphs with an average accuracy of 67.26%, surpassing not only all inductive baselines, but also strong transductive methods trained separately on each of the 30 test graphs.

Fully-inductive Node Classification on Arbitrary Graphs

TL;DR

Abstract

Paper Structure (25 sections, 15 equations, 10 figures, 6 tables)

This paper contains 25 sections, 15 equations, 10 figures, 6 tables.

Introduction
Related Work
GraphAny: Fully-inductive Node Classification on Any Graph
Inductive Inference with LinearGNNs
Learning Inductive Attention over LinearGNN Predictions
Permutation-Invariant Attention with Distance Features.
Robust Dimension Generalization with Entropy Normalization.
Efficient Training and Fully-inductive Inference
Experiments
Experimental Setup
Performance of Inductive Node Classification
Visualization of the Inductive Attention
Ablation Studies
Conclusion
Proof of Feature and Label Permutation Invariance
...and 10 more sections

Figures (10)

Figure 1: Average performance on 31 datasets. GraphAny is trained on a single dataset (Wisconsin or Arxiv) and performs inductive inference on any graph. The other methods have to be trained on each dataset.
Figure 2: Fully-inductive node classification: Trained on a graph $G$, a fully-inductive model should generalize to any new graph $G'$ with new feature and label spaces without additional training.
Figure 3: Overview of GraphAny: LinearGNNs are used to perform non-parametric predictions and derives the entropy-normalized distance features. The final prediction is generated by fusing multiple LinearGNN predictions on each node with an attention learned based on the distance features.
Figure 4: Transformations on graph features and labels: permutation (left), masking (right).
Figure 5: Comparison of Euclidean distances (the first row) and entropy-normalized (the second row) features between five channels: ${\bm{F}} = {\bm{X}}$ (Linear), ${\bm{F}} = \bar{{\bm{A}}} {\bm{X}}$ (LinearSGC1), ${\bm{F}} = \bar{{\bm{A}}}^2 {\bm{X}}$ (LinearSGC2), ${\bm{F}} = ({\bm{I}} - \bar{{\bm{A}}}) {\bm{X}}$ (LinearHGC1) and ${\bm{F}} = ({\bm{I}} - \bar{{\bm{A}}})^2 {\bm{X}}$ (LinearHGC2) with $\bar{{\bm{A}}}$ denoting the row normalized adjaceny matrix. Entropy-normalized features are on the same scale and exhibit transferrable patterns across datasets.
...and 5 more figures

Fully-inductive Node Classification on Arbitrary Graphs

TL;DR

Abstract

Fully-inductive Node Classification on Arbitrary Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (10)