Table of Contents
Fetching ...

Out-of-Distribution Graph Models Merging

Yidi Wang, Ziyue Qiao, Jiawei Gu, Xubin Zheng, Pengyang Wang, Xiaobing Pei, Xiao Luo

Abstract

This paper studies a novel problem of out-of-distribution graph models merging, which aims to construct a generalized model from multiple graph models pre-trained on different domains with distribution discrepancy. This problem is challenging because of the difficulty in learning domain-invariant knowledge implicitly in model parameters and consolidating expertise from potentially heterogeneous GNN backbones. In this work, we propose a graph generation strategy that instantiates the mixture distribution of multiple domains. Then, we merge and fine-tune the pre-trained graph models via a MoE module and a masking mechanism for generalized adaptation. Our framework is architecture-agnostic and can operate without any source/target domain data. Both theoretical analysis and experimental results demonstrate the effectiveness of our approach in addressing the model generalization problem.

Out-of-Distribution Graph Models Merging

Abstract

This paper studies a novel problem of out-of-distribution graph models merging, which aims to construct a generalized model from multiple graph models pre-trained on different domains with distribution discrepancy. This problem is challenging because of the difficulty in learning domain-invariant knowledge implicitly in model parameters and consolidating expertise from potentially heterogeneous GNN backbones. In this work, we propose a graph generation strategy that instantiates the mixture distribution of multiple domains. Then, we merge and fine-tune the pre-trained graph models via a MoE module and a masking mechanism for generalized adaptation. Our framework is architecture-agnostic and can operate without any source/target domain data. Both theoretical analysis and experimental results demonstrate the effectiveness of our approach in addressing the model generalization problem.

Paper Structure

This paper contains 31 sections, 1 theorem, 31 equations, 17 figures, 5 tables, 1 algorithm.

Key Result

Theorem 3.3

If each $f(\Theta_i)$ is an optimal learner trained on the marginal distribution $\mathcal{G}_i$, the upper bound of the generalization error for $\Gamma$ on the target domain is given by the sum of the cross-validation errors of these sub-learners across different distributions.

Figures (17)

  • Figure 1: Illustration of Out-of-Distribution Graph Models Merging.
  • Figure 2: Comparison of different GNN models' generalization performance on PTC between in-distribution and OOD scenarios, with three domains represented as A / B / C. Values indicate Acc (%). The results within the red dashed box represent best performance.
  • Figure 3: Architecture overview. The architecture of OGMM consists of two primary stages: (1) Graph generation. Each pre-trained GNN serves as a supervisor to train its corresponding generator, which reconstructs label-conditional graphs from random noise. (2) Model merging. The generative graphs are aggregated to train a merged GNN using a MoE module. It comprises a gating layer and a set of fine-tuned masked experts. Gradient updates are guided by mask and gating regularization terms alongside classification loss.
  • Figure 4: Impact of Mask Position on RED. (REDDIT-B) and MUT. (MUTAG). The form $(A \rightarrow T)$ means that a GNN pre-trained on domain A and fine-tuned on the Target domain. The bar chart shows the model performance on the target domain, and the dashed line represents the average performance of masked models with different mask positions on this dataset.
  • Figure 5: The Effects of $k$ in $TopK$ Expert Selection on four datasets.
  • ...and 12 more figures

Theorems & Definitions (4)

  • Definition 3.2
  • Theorem 3.3
  • proof
  • proof