Table of Contents
Fetching ...

Graphs Generalization under Distribution Shifts

Qin Tian, Wenjun Wang, Chen Zhao, Minglai Shao, Wang Zhang, Dong Li

TL;DR

A novel framework, namely Graph Learning Invariant Domain genERation (GLIDER), is introduced, which aims to diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels.

Abstract

Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a significant process. However, the OOD method for graph-structured data currently lacks clarity and remains relatively unexplored due to two primary challenges. Firstly, distribution shifts on graphs often occur simultaneously on node attributes and graph topology. Secondly, capturing invariant information amidst diverse distribution shifts proves to be a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework, namely Graph Learning Invariant Domain genERation (GLIDER). The goal is to (1) diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and (2) minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels. Extensive experiment results indicate that our model outperforms baseline methods on node-level OOD generalization across domains in distribution shift on node features and topological structures simultaneously.

Graphs Generalization under Distribution Shifts

TL;DR

A novel framework, namely Graph Learning Invariant Domain genERation (GLIDER), is introduced, which aims to diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels.

Abstract

Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a significant process. However, the OOD method for graph-structured data currently lacks clarity and remains relatively unexplored due to two primary challenges. Firstly, distribution shifts on graphs often occur simultaneously on node attributes and graph topology. Secondly, capturing invariant information amidst diverse distribution shifts proves to be a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework, namely Graph Learning Invariant Domain genERation (GLIDER). The goal is to (1) diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and (2) minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels. Extensive experiment results indicate that our model outperforms baseline methods on node-level OOD generalization across domains in distribution shift on node features and topological structures simultaneously.
Paper Structure (22 sections, 18 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 22 sections, 18 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of the GLIDER framework. First, GLIDER generates a $G'$ that has the same topology, but a different attribute distribution with the input graph $G$. In this step, it learns a semantic feature encoder $E^c$, a variation feature encoder $E^r$, and a decoder $D$. Then it generates node attributes $X'$ in new synthetic domains, wherein only the variation factor is replaced with a sampled one from $\mathcal{N}(0,\mathbf{I})$. Second, it learns generators that can generate $K$ adjacent matrix with structure as differently as possible. Then we can generate $K$ graphs $\{G_k^{"}=(A_k, X')\}_{k=1}^K$ with different $X$ and $A$ compared with $G$.
  • Figure 2: Loss curve of training discriminator where Loss_Adv represents the adversarial loss and Loss_Rec represents the reconstruction loss.
  • Figure 3: Sensitivity of GLIDER w.r.t the weight coefficient in Eq. (\ref{['eq.invarant_3']}) and the number of edits for each node by the edge.
  • Figure 4: The accuracy of GLIDER (Ours) and another two baseline methods using different GNN backbones on WebKB.

Theorems & Definitions (1)

  • proof