Table of Contents
Fetching ...

Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

Zheng Huang, Qihui Yang, Dawei Zhou, Yujun Yan

TL;DR

This paper tackles size generalization in graph neural networks by disentangling size-related from task-related information. It introduces DisGen, which uses size- and task-invariant graph augmentations and a decoupling loss to minimize shared information between size and task representations, with theoretical guarantees. The approach is model-agnostic and validated across multiple datasets and backbones, achieving up to a 6% improvement on larger test graphs. The work advances practical size generalization for GNNs and offers a principled framework for disentangled representation learning in graphs.

Abstract

Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and model-agnostic framework designed to disentangle size factors from graph representations. DISGEN employs size- and task-invariant augmentations and introduces a decoupling loss that minimizes shared information in hidden representations, with theoretical guarantees for its effectiveness. Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets, underscoring its effectiveness in enhancing the size generalizability of GNNs. Our codes are available at: https://github.com/GraphmindDartmouth/DISGEN.

Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

TL;DR

This paper tackles size generalization in graph neural networks by disentangling size-related from task-related information. It introduces DisGen, which uses size- and task-invariant graph augmentations and a decoupling loss to minimize shared information between size and task representations, with theoretical guarantees. The approach is model-agnostic and validated across multiple datasets and backbones, achieving up to a 6% improvement on larger test graphs. The work advances practical size generalization for GNNs and offers a principled framework for disentangled representation learning in graphs.

Abstract

Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and model-agnostic framework designed to disentangle size factors from graph representations. DISGEN employs size- and task-invariant augmentations and introduces a decoupling loss that minimizes shared information in hidden representations, with theoretical guarantees for its effectiveness. Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets, underscoring its effectiveness in enhancing the size generalizability of GNNs. Our codes are available at: https://github.com/GraphmindDartmouth/DISGEN.
Paper Structure (19 sections, 5 theorems, 36 equations, 2 figures, 12 tables)

This paper contains 19 sections, 5 theorems, 36 equations, 2 figures, 12 tables.

Key Result

Theorem 3.2

Consider the composite functions $\textbf{\rm{ENC}}_i \circ \bold{f} (\cdot)$, $i \in {1,2}$, defined on a closed set $S \in \mathbb{R}^{2c}$. Assume that these composite functions are twice differentiable at some point $\bold{r}_0$, and the gradients $\nabla\bold{h}_t$ and $\nabla\bold{h}_s$ at $\b $\Rightarrow$$\tilde{\bold{t}}$ and $\tilde{\bold{s}}$ can not be decoupled from $\bold{f}(\cdot,\c

Figures (2)

  • Figure 1: Framework overview: our model augments each graph $\mathcal{G}_i$ with size- and task-invariant views ($\mathcal{G}_i^{(1)}$ and $\mathcal{G}_i^{(2)}$), which, along with the original graph, are processed by a shared GNN backbone. Two encoders then generate size- ($\bold{s}_i$) and task-related ($\bold{t}_i$) representations, respectively. A contrastive loss on size-related representations guides relative size learning, while a decoupling loss ensures the separation of size- and task-related information.
  • Figure 2: Augmentation overview: view $\mathcal{G}_i^{(1)}$ is generated by removing edges that most significantly change the label information, while $\mathcal{G}_i^{(2)}$ results from eliminating nodes that have little impact on the model predictions.

Theorems & Definitions (10)

  • Definition 3.1
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • Definition 3.4
  • Theorem 3.5
  • proof
  • Lemma 1.1
  • Lemma 1.2