Table of Contents
Fetching ...

Dissecting the Failure of Invariant Learning on Graphs

Qixun Wang, Yifei Wang, Yisen Wang, Xianghua Ying

TL;DR

This work reveals that traditional invariant learning methods like IRM and VREx often fail for node-level OOD generalization on graphs due to neglecting class-conditional invariance in the graph setting. The authors introduce a Structural Causal Model to dissect how invariant ego-graphs and spurious features interact with graph structure, and propose Cross-environment Intra-class Alignment (CIA) to enforce class-aware invariance, plus CIA-LRA for environments without labels by leveraging local neighborhood label distributions. They provide a PAC-Bayesian OOD generalization bound under a CSBM-OOD framework and empirically demonstrate that CIA achieves superior graph-OOD generalization, with CIA-LRA delivering further improvements and enabling plug-in gains for existing graph-OOD methods. The results offer a principled path to robust node-level OOD learning on graphs and highlight the importance of locality and neighborhood distribution shifts in graph data.

Abstract

Enhancing node-level Out-Of-Distribution (OOD) generalization on graphs remains a crucial area of research. In this paper, we develop a Structural Causal Model (SCM) to theoretically dissect the performance of two prominent invariant learning methods -- Invariant Risk Minimization (IRM) and Variance-Risk Extrapolation (VREx) -- in node-level OOD settings. Our analysis reveals a critical limitation: due to the lack of class-conditional invariance constraints, these methods may struggle to accurately identify the structure of the predictive invariant ego-graph and consequently rely on spurious features. To address this, we propose Cross-environment Intra-class Alignment (CIA), which explicitly eliminates spurious features by aligning cross-environment representations conditioned on the same class, bypassing the need for explicit knowledge of the causal pattern structure. To adapt CIA to node-level OOD scenarios where environment labels are hard to obtain, we further propose CIA-LRA (Localized Reweighting Alignment) that leverages the distribution of neighboring labels to selectively align node representations, effectively distinguishing and preserving invariant features while removing spurious ones, all without relying on environment labels. We theoretically prove CIA-LRA's effectiveness by deriving an OOD generalization error bound based on PAC-Bayesian analysis. Experiments on graph OOD benchmarks validate the superiority of CIA and CIA-LRA, marking a significant advancement in node-level OOD generalization. The codes are available at https://github.com/NOVAglow646/NeurIPS24-Invariant-Learning-on-Graphs.

Dissecting the Failure of Invariant Learning on Graphs

TL;DR

This work reveals that traditional invariant learning methods like IRM and VREx often fail for node-level OOD generalization on graphs due to neglecting class-conditional invariance in the graph setting. The authors introduce a Structural Causal Model to dissect how invariant ego-graphs and spurious features interact with graph structure, and propose Cross-environment Intra-class Alignment (CIA) to enforce class-aware invariance, plus CIA-LRA for environments without labels by leveraging local neighborhood label distributions. They provide a PAC-Bayesian OOD generalization bound under a CSBM-OOD framework and empirically demonstrate that CIA achieves superior graph-OOD generalization, with CIA-LRA delivering further improvements and enabling plug-in gains for existing graph-OOD methods. The results offer a principled path to robust node-level OOD learning on graphs and highlight the importance of locality and neighborhood distribution shifts in graph data.

Abstract

Enhancing node-level Out-Of-Distribution (OOD) generalization on graphs remains a crucial area of research. In this paper, we develop a Structural Causal Model (SCM) to theoretically dissect the performance of two prominent invariant learning methods -- Invariant Risk Minimization (IRM) and Variance-Risk Extrapolation (VREx) -- in node-level OOD settings. Our analysis reveals a critical limitation: due to the lack of class-conditional invariance constraints, these methods may struggle to accurately identify the structure of the predictive invariant ego-graph and consequently rely on spurious features. To address this, we propose Cross-environment Intra-class Alignment (CIA), which explicitly eliminates spurious features by aligning cross-environment representations conditioned on the same class, bypassing the need for explicit knowledge of the causal pattern structure. To adapt CIA to node-level OOD scenarios where environment labels are hard to obtain, we further propose CIA-LRA (Localized Reweighting Alignment) that leverages the distribution of neighboring labels to selectively align node representations, effectively distinguishing and preserving invariant features while removing spurious ones, all without relying on environment labels. We theoretically prove CIA-LRA's effectiveness by deriving an OOD generalization error bound based on PAC-Bayesian analysis. Experiments on graph OOD benchmarks validate the superiority of CIA and CIA-LRA, marking a significant advancement in node-level OOD generalization. The codes are available at https://github.com/NOVAglow646/NeurIPS24-Invariant-Learning-on-Graphs.

Paper Structure

This paper contains 60 sections, 18 theorems, 113 equations, 18 figures, 8 tables, 1 algorithm.

Key Result

Proposition 2.2

(IRMv1 and VREx can learn invariant features for non-graph tasks, proof is in Appendix proof_VREx_IRM_suc.) For the non-graph version of the SCM in Equation (data_gene), VREx and IRMv1 can learn invariant features when using a linear network: $f(X)=\theta_1 X_1 + \theta_2 X_2$.

Figures (18)

  • Figure 1: Real-Cov./Con. are average OOD accuracy on the covariate/concept shift of Arxiv, Cora, CBAS, and WebKB. Toy denotes results on our toy synthetic graph OOD dataset.
  • Figure 1: Causal graphs of the SCMs considered in our work.
  • Figure 2: The overall framework of our proposed CIA-LRA. The invariant subgraph extractor $\phi_{\theta_m}$ identifies the invariant subgraph for each node. Then the GNN encoder $\phi_{\Theta}$ aggregates information from the estimated invariant subgraphs to output node representations. CIA-LRA mainly contains two strategies: localized alignment and reweighting alignment. Localized alignment: we restrict the alignment to a local range to avoid overalignment that may cause the collapse of invariant features (shown in Appendix \ref{['app:excessive-align-collapse']}). Reweighting alignment: to better eliminate spurious features and preserve invariant features without using environment labels, we assign large weights to node pairs with significant discrepancies in heterophilic Neighborhood Label Distribution (NLD) and minor discrepancies in homophilic NLD. See Section \ref{['CIA-LRA_method_sec']} for a detailed analysis of CIA-LRA.
  • Figure 3: By replacing VREx in EERM with CIA (marked as EERM-CIA), the performance is significantly improved.
  • Figure 3: Left: OOD test accuracy. Mid: the variance of the invariant representation. Right: the norm of the spurious representation. CIA and CIA-LRA use $\lambda=0.5$ in this figure.
  • ...and 13 more figures

Theorems & Definitions (29)

  • Proposition 2.2
  • Theorem 2.3
  • Theorem 3.1
  • Definition 4.1
  • Theorem 4.4
  • Proposition B.1
  • Proposition B.2
  • Proposition B.3
  • Proposition B.4
  • Proposition G.1
  • ...and 19 more