Table of Contents
Fetching ...

HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters

Yujie Mo, Runpeng Yu, Xiaofeng Zhu, Xinchao Wang

TL;DR

A unified framework is proposed that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models and designs dual structure-aware adapters to adaptively fit task-related homogeneous and heterogeneous structural information.

Abstract

The "pre-train, prompt-tuning'' paradigm has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs) by mitigating the gap between pre-trained models and downstream tasks. However, most prompt-tuning-based works may face at least two limitations: (i) the model may be insufficient to fit the graph structures well as they are generally ignored in the prompt-tuning stage, increasing the training error to decrease the generalization ability; and (ii) the model may suffer from the limited labeled data during the prompt-tuning stage, leading to a large generalization gap between the training error and the test error to further affect the model generalization. To alleviate the above limitations, we first derive the generalization error bound for existing prompt-tuning-based methods, and then propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models. Specifically, we design dual structure-aware adapters to adaptively fit task-related homogeneous and heterogeneous structural information. We further design a label-propagated contrastive loss and two self-supervised losses to optimize dual adapters and incorporate unlabeled nodes as potential labeled data. Theoretical analysis indicates that the proposed method achieves a lower generalization error bound than existing methods, thus obtaining superior generalization ability. Comprehensive experiments demonstrate the effectiveness and generalization of the proposed method on different downstream tasks.

HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters

TL;DR

A unified framework is proposed that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models and designs dual structure-aware adapters to adaptively fit task-related homogeneous and heterogeneous structural information.

Abstract

The "pre-train, prompt-tuning'' paradigm has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs) by mitigating the gap between pre-trained models and downstream tasks. However, most prompt-tuning-based works may face at least two limitations: (i) the model may be insufficient to fit the graph structures well as they are generally ignored in the prompt-tuning stage, increasing the training error to decrease the generalization ability; and (ii) the model may suffer from the limited labeled data during the prompt-tuning stage, leading to a large generalization gap between the training error and the test error to further affect the model generalization. To alleviate the above limitations, we first derive the generalization error bound for existing prompt-tuning-based methods, and then propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models. Specifically, we design dual structure-aware adapters to adaptively fit task-related homogeneous and heterogeneous structural information. We further design a label-propagated contrastive loss and two self-supervised losses to optimize dual adapters and incorporate unlabeled nodes as potential labeled data. Theoretical analysis indicates that the proposed method achieves a lower generalization error bound than existing methods, thus obtaining superior generalization ability. Comprehensive experiments demonstrate the effectiveness and generalization of the proposed method on different downstream tasks.

Paper Structure

This paper contains 36 sections, 5 theorems, 41 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Theorem 2.3

(Generalization error bound for prompt-tuning-based methods.) Statistically, the upper bound $\mathcal{U}(\mathcal{E}_M)$ of the test error $\mathcal{E}_M$ of a pre-trained HGNN model with prompt-tuning can be determined as follows: where training data $\mathcal{D}_{n_{M}}$ and prompt-tuning parameters $\overline{\mathcal{P}}_{M}$ are variables of the training error $\hat{\mathcal{E}}_M$ of the m

Figures (7)

  • Figure 1: The flowchart of the proposed HG-Adapter. Given the frozen representations $\tilde{\mathbf{H}}$ from the pre-trained model, the homogeneous adapter is designed to generate the adapted representations $\tilde{\mathbf{F}}$ by tuning the homogeneous graph structure $\mathbf{A}$ (details in the top right corner). After that, the homogeneous representations $\tilde{\mathbf{Z}}$ are obtained by summing the frozen representations $\tilde{\mathbf{E}}$ after message-passing and $\tilde{\mathbf{F}}$. Similarly, the heterogeneous representations $\hat{\mathbf{Z}}$ are obtained by summing the frozen representations $\hat{\mathbf{E}}$ after message-passing and the adapted representations $\hat{\mathbf{M}}$ from the heterogeneous adapter (details in the bottom right corner). Furthermore, $\tilde{\mathbf{Z}}$ and $\hat{\mathbf{Z}}$ are concatenated to generate ${\mathbf{Z}}$, which is then mapped to the prediction matrix $\mathbf{P}$. Finally, HG-Adapter designs a label-propagated contrastive loss ( i.e., $\mathcal{L}_{con}$) and two self-supervised losses ( i.e., $\mathcal{L}_{rec}$ and $\mathcal{L}_{mar}$) to optimize dual adapters and extend potential labeled data to improve the model generalization.
  • Figure 2: (a) Homophily ratios of the homogeneous graph structure $\mathbf{A}$ learned by HERO+HG-Adapter on four datasets. (b) Node classification results without maximal/minimal weights neighbors in the heterogeneous graph structure $\mathbf{S}$ learned by HERO+HG-Adapter on three datasets (excluding the DBLP dataset, as its target node has only one type of neighbors). (c) Test errors of HERO with different tuning methods ( i.e., traditional fine-tuning, prompt-tuning-based HetGPT, and the proposed HG-Adapter) on the ACM dataset.
  • Figure 3: The training error of the proposed method with and without the structure tuning on all heterogeneous graph datasets.
  • Figure 4: The generalization gap (the difference between test error and training error) of the proposed method with and without the labeled data extension on all heterogeneous graph datasets.
  • Figure 5: Visualization plotted by t-SNE and the corresponding silhouette scores (SIL) of the representations learned by HERO with fine-tuning, prompt-tuning ( i.e., HetGPT), and the proposed adapter-tuning on the DBLP dataset, respectively.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Definition 2.1
  • Definition 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Lemma C.1
  • proof
  • Theorem C.2
  • proof
  • Theorem C.3
  • proof