Table of Contents
Fetching ...

Invariant Graph Transformer for Out-of-Distribution Generalization

Tianyin Liao, Ziwei Zhang, Yufei Sun, Chunyu Hu, Jianxin Li

Abstract

Graph Transformers (GTs) have demonstrated great effectiveness across various graph analytical tasks. However, the existing GTs focus on training and testing graph data originated from the same distribution, but fail to generalize under distribution shifts. Graph invariant learning, aiming to capture generalizable graph structural patterns with labels under distribution shifts, is potentially a promising solution, but how to design attention mechanisms and positional and structural encodings (PSEs) based on graph invariant learning principles remains challenging. To solve these challenges, we introduce Graph Out-Of-Distribution generalized Transformer (GOODFormer), aiming to learn generalized graph representations by capturing invariant relationships between predictive graph structures and labels through jointly optimizing three modules. Specifically, we first develop a GT-based entropy-guided invariant subgraph disentangler to separate invariant and variant subgraphs while preserving the sharpness of the attention function. Next, we design an evolving subgraph positional and structural encoder to effectively and efficiently capture the encoding information of dynamically changing subgraphs during training. Finally, we propose an invariant learning module utilizing subgraph node representations and encodings to derive generalizable graph representations that can to unseen graphs. We also provide theoretical justifications for our method. Extensive experiments on benchmark datasets demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts.

Invariant Graph Transformer for Out-of-Distribution Generalization

Abstract

Graph Transformers (GTs) have demonstrated great effectiveness across various graph analytical tasks. However, the existing GTs focus on training and testing graph data originated from the same distribution, but fail to generalize under distribution shifts. Graph invariant learning, aiming to capture generalizable graph structural patterns with labels under distribution shifts, is potentially a promising solution, but how to design attention mechanisms and positional and structural encodings (PSEs) based on graph invariant learning principles remains challenging. To solve these challenges, we introduce Graph Out-Of-Distribution generalized Transformer (GOODFormer), aiming to learn generalized graph representations by capturing invariant relationships between predictive graph structures and labels through jointly optimizing three modules. Specifically, we first develop a GT-based entropy-guided invariant subgraph disentangler to separate invariant and variant subgraphs while preserving the sharpness of the attention function. Next, we design an evolving subgraph positional and structural encoder to effectively and efficiently capture the encoding information of dynamically changing subgraphs during training. Finally, we propose an invariant learning module utilizing subgraph node representations and encodings to derive generalizable graph representations that can to unseen graphs. We also provide theoretical justifications for our method. Extensive experiments on benchmark datasets demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts.

Paper Structure

This paper contains 33 sections, 4 theorems, 25 equations, 4 figures, 9 tables, 1 algorithm.

Key Result

theorem 2

Suppose $\boldsymbol{\mathrm{G}}_S\rightarrow \boldsymbol{\mathrm{G}}_C$ does not exist, $\mathcal{L}(\cdot)$ is a loss function and there exists one and only one non-trival subgraph $\boldsymbol{\mathrm{G}}_C$. Under these conditions, any model comprising Eq. eq:complementatt cannot guarantee the f

Figures (4)

  • Figure 1: The framework of GOODFormer, which jointly optimizes three modules: (1) The entropy-guided invariant subgraph disentangler utilizes Transformer layers, entropy-guided sharp attention and the attention-guided MPNN to separate invariant and variant subgraphs. (2) The evolving subgraph positional and structural encoder captures valuable information of dynamically changing subgraphs during training while maintaining expressiveness. (3) The invariant GT learning module optimizes tailored objective functions to derive graph representations for generalizing to unseen test graphs.
  • Figure 2: Visualizations of the learned invariant subgraphs from the testing set of the GOOD-Motif dataset with basis split. In Figures (a) to (d), the invariant subgraphs identified by different methods are highlighted in red, while the ground truth is depicted in black in Figure (e).
  • Figure 3: The impact of different hyper-parameters. Red lines denote the results of GOODFormer and grey dashed lines correspond to the results of EXPHORMER.
  • Figure 4: The impact of different hyper-parameters. Orange and red lines denote the results of GOODFormer, while grey dashed lines indicate: (1) CIGA's performance on the basis split and (2) EXPHORMER's performance on the size split.

Theorems & Definitions (6)

  • theorem 2
  • theorem 3
  • theorem 4
  • proof
  • theorem 4
  • proof