Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

Kai Guo; Hongzhi Wen; Wei Jin; Yaming Guo; Jiliang Tang; Yi Chang

Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

Kai Guo, Hongzhi Wen, Wei Jin, Yaming Guo, Jiliang Tang, Yi Chang

TL;DR

This paper addresses the underexplored question of how GNN backbone architectures affect graph out-of-distribution generalization. It shows that graph self-attention and decoupled propagation improve OOD robustness, while a final linear prediction layer can hurt generalization; these insights are grounded in information bottleneck theory. Building on this, the authors introduce DGat, a decoupled graph attention backbone that omits the final linear layer and uses adaptive propagation driven by attention, achieving strong OOD performance across diverse training strategies. The work demonstrates that backbone design is a crucial lever for graph OOD generalization and provides both theoretical justification and empirical evidence to guide architecture choices in practical deployments.

Abstract

Graph neural networks (GNNs) have exhibited remarkable performance under the assumption that test data comes from the same distribution of training data. However, in real-world scenarios, this assumption may not always be valid. Consequently, there is a growing focus on exploring the Out-of-Distribution (OOD) problem in the context of graphs. Most existing efforts have primarily concentrated on improving graph OOD generalization from two \textbf{model-agnostic} perspectives: data-driven methods and strategy-based learning. However, there has been limited attention dedicated to investigating the impact of well-known \textbf{GNN model architectures} on graph OOD generalization, which is orthogonal to existing research. In this work, we provide the first comprehensive investigation of OOD generalization on graphs from an architecture perspective, by examining the common building blocks of modern GNNs. Through extensive experiments, we reveal that both the graph self-attention mechanism and the decoupled architecture contribute positively to graph OOD generalization. In contrast, we observe that the linear classification layer tends to compromise graph OOD generalization capability. Furthermore, we provide in-depth theoretical insights and discussions to underpin these discoveries. These insights have empowered us to develop a novel GNN backbone model, DGAT, designed to harness the robust properties of both graph self-attention mechanism and the decoupled architecture. Extensive experimental results demonstrate the effectiveness of our model under graph OOD, exhibiting substantial and consistent enhancements across various training strategies.

Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

TL;DR

Abstract

Paper Structure (25 sections, 2 theorems, 11 equations, 5 figures, 13 tables)

This paper contains 25 sections, 2 theorems, 11 equations, 5 figures, 13 tables.

Introduction
Preliminaries
Graph OOD Generalization Problem
Graph Neural Network Architectures
Investigating OOD Generalization of GNN Architectures
Experimental Setup
Impact of Attention Mechanism
Impact of Coupled/Decoupled Structure
Impact of Linear Prediction Layer
New GNN Design for Enhanced OOD Generalization
A New GNN Design
Experiment
OOD performance of DGat
DGat Performance as a Backbone
Related work
...and 10 more sections

Key Result

Proposition 1

Given a node $i$ with its feature vector $x_i$ and its neighborhood $\mathcal{N}(i)$, the following aggregation scheme for obtaining its hidden representation ${\bf z}_i$, with $\eta_i, \mathbf{W}_Q, \mathbf{W}_K$ being the learnable parameters, can be understood as the iterative process to optimize the objective in Eq. eq:ib.

Figures (5)

Figure 1: Comparision of OOD and GAP between GCN and GCN-- for investagating the impact of linear classifier. GCN-- means GCN without linear classifier. D1, D2, D3, D4, D5, D6 represent G-Cora-Word, G-Cora-Degree, G-Arxiv-Time, G-Arxiv-Degree, G-Twitch-Language and G-WebKB-University respectively.
Figure 2: An illustration of our proposed model DGat. In this decoupled architecture, we calculate attention scores from transformed features and employ these scores throughout each propagation layer.
Figure 3: Comparision of OOD performance between DGat and APPNP equipped with various OOD algorithms. Results on more datasets are reported in Appendix \ref{['sec:backbone']}.
Figure 4: $P_{value}$ and $T_{value}$ of GAT and GCN--
Figure 5: $P_{value}$ and $T_{value}$ of SGC and APPNP($\beta=0$)

Theorems & Definitions (2)

Proposition 1
Proposition 1

Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

TL;DR

Abstract

Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)