Table of Contents
Fetching ...

Invariant Graph Learning Meets Information Bottleneck for Out-of-Distribution Generalization

Wenyu Mao, Jiancan Wu, Haoyang Liu, Yongduo Sui, Xiang Wang

TL;DR

This work tackles graph OOD generalization by introducing InfoIGL, an invariant graph learning framework grounded in information bottleneck theory. It combines a redundancy-filtered GNN encoder with multi-level (semantic and instance) contrastive learning to maximize mutual information among graphs of the same class while suppressing environment-related redundancy. The method achieves state-of-the-art performance on synthetic and real-world graph classification benchmarks and shows promising scalability to node classification, with robust ablations supporting the necessity of both redundancy reduction and dual-level contrastive learning. The approach offers a principled path to robust graph representations under distribution shifts, with practical implications for real-world GNN deployments.

Abstract

Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generation. Despite the great success of invariant learning in OOD problems for Euclidean data (i.e., images), the exploration within graph data remains constrained by the complex nature of graphs. Existing studies, such as data augmentation or causal intervention, either suffer from disruptions to invariance during the graph manipulation process or face reliability issues due to a lack of supervised signals for causal parts. In this work, we propose a novel framework, called Invariant Graph Learning based on Information bottleneck theory (InfoIGL), to extract the invariant features of graphs and enhance models' generalization ability to unseen distributions. Specifically, InfoIGL introduces a redundancy filter to compress task-irrelevant information related to environmental factors. Cooperating with our designed multi-level contrastive learning, we maximize the mutual information among graphs of the same class in the downstream classification tasks, preserving invariant features for prediction to a great extent. An appealing feature of InfoIGL is its strong generalization ability without depending on supervised signal of invariance. Experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance under OOD generalization for graph classification tasks. The source code is available at https://github.com/maowenyu-11/InfoIGL.

Invariant Graph Learning Meets Information Bottleneck for Out-of-Distribution Generalization

TL;DR

This work tackles graph OOD generalization by introducing InfoIGL, an invariant graph learning framework grounded in information bottleneck theory. It combines a redundancy-filtered GNN encoder with multi-level (semantic and instance) contrastive learning to maximize mutual information among graphs of the same class while suppressing environment-related redundancy. The method achieves state-of-the-art performance on synthetic and real-world graph classification benchmarks and shows promising scalability to node classification, with robust ablations supporting the necessity of both redundancy reduction and dual-level contrastive learning. The approach offers a principled path to robust graph representations under distribution shifts, with practical implications for real-world GNN deployments.

Abstract

Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generation. Despite the great success of invariant learning in OOD problems for Euclidean data (i.e., images), the exploration within graph data remains constrained by the complex nature of graphs. Existing studies, such as data augmentation or causal intervention, either suffer from disruptions to invariance during the graph manipulation process or face reliability issues due to a lack of supervised signals for causal parts. In this work, we propose a novel framework, called Invariant Graph Learning based on Information bottleneck theory (InfoIGL), to extract the invariant features of graphs and enhance models' generalization ability to unseen distributions. Specifically, InfoIGL introduces a redundancy filter to compress task-irrelevant information related to environmental factors. Cooperating with our designed multi-level contrastive learning, we maximize the mutual information among graphs of the same class in the downstream classification tasks, preserving invariant features for prediction to a great extent. An appealing feature of InfoIGL is its strong generalization ability without depending on supervised signal of invariance. Experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance under OOD generalization for graph classification tasks. The source code is available at https://github.com/maowenyu-11/InfoIGL.
Paper Structure (28 sections, 19 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 19 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: The comparison of the two branches of existing methods and our method. The upper is data manipulation, which edits nodes or edges and is prone to destroying the invariant parts. The causal disentanglement approach in the middle subfigure separates causal subgraphs and intervenes across various environmental conditions (green patterns), which may fail to identify the causal parts accurately. In contrast, our method in the lower subfigure first removes most of the redundancy to avoid distraction for invariance discovery, facilitating higher identifying accuracy. Then it preserves sufficient predictive information (house) in invariance by maximizing the mutual information, without the use of supervised labels for invariance.
  • Figure 2: The overview of proposed InfoIGL framework. The training graphs are fed into the GNN encoder and attention mechanism sui2022causalbrody2021attentive. After being projected to another space, instance embeddings are aggregated to semantics. Then semantic-level and instance-level contrastive learning are optimized jointly, along with instance constraint and hard negative mining to avoid model collapse.
  • Figure 3: The t-SNE visualizations for different levels of contrastive learning.
  • Figure 4: Sensitivity analysis of hyperparameters $\lambda_c, \lambda_s, \lambda_i$
  • Figure 5: The invariance obtained by InfoIGL.