Table of Contents
Fetching ...

Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks

Tianyu Fan, Lirong Wu, Yufei Huang, Haitao Lin, Cheng Tan, Zhangyang Gao, Stan Z. Li

TL;DR

The paper tackles the challenge of integrating multiple graph pre-training tasks by separating the processes of task selection and task weighting. It proposes WAS, a decoupled siamese-network framework that performs instance-level multi-teacher knowledge distillation, enabling per-instance customization of task combinations and weights. A key feature is decoupled, momentum-updated selecting and weighting modules, with a Gumbel-Softmax-based mechanism to dynamically quit less useful tasks and re-weight the selected subset. Empirically, WAS achieves comparable or superior performance to leading baselines on 16 graph datasets and exhibits consistent gains as the task pool expands, demonstrating improved handling of task compatibility and scalability for graph pre-training.

Abstract

Recent years have witnessed the great success of graph pre-training for graph representation learning. With hundreds of graph pre-training tasks proposed, integrating knowledge acquired from multiple pre-training tasks has become a popular research topic. In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a given task pool based on their compatibility, and (2) weigh: how to weigh the selected tasks based on their importance. While there currently has been a lot of work focused on weighing, comparatively little effort has been devoted to selecting. This paper proposes a novel instance-level framework for integrating multiple graph pre-training tasks, Weigh And Select (WAS), where the two collaborative processes, weighing and selecting, are combined by decoupled siamese networks. Specifically, it first adaptively learns an optimal combination of tasks for each instance from a given task pool, based on which a customized instance-level task weighing strategy is learned. Extensive experiments on 16 graph datasets across node-level and graph-level downstream tasks have demonstrated that by combining a few simple but classical tasks, WAS can achieve comparable performance to other leading counterparts. The code is available at https://github.com/TianyuFan0504/WAS.

Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks

TL;DR

The paper tackles the challenge of integrating multiple graph pre-training tasks by separating the processes of task selection and task weighting. It proposes WAS, a decoupled siamese-network framework that performs instance-level multi-teacher knowledge distillation, enabling per-instance customization of task combinations and weights. A key feature is decoupled, momentum-updated selecting and weighting modules, with a Gumbel-Softmax-based mechanism to dynamically quit less useful tasks and re-weight the selected subset. Empirically, WAS achieves comparable or superior performance to leading baselines on 16 graph datasets and exhibits consistent gains as the task pool expands, demonstrating improved handling of task compatibility and scalability for graph pre-training.

Abstract

Recent years have witnessed the great success of graph pre-training for graph representation learning. With hundreds of graph pre-training tasks proposed, integrating knowledge acquired from multiple pre-training tasks has become a popular research topic. In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a given task pool based on their compatibility, and (2) weigh: how to weigh the selected tasks based on their importance. While there currently has been a lot of work focused on weighing, comparatively little effort has been devoted to selecting. This paper proposes a novel instance-level framework for integrating multiple graph pre-training tasks, Weigh And Select (WAS), where the two collaborative processes, weighing and selecting, are combined by decoupled siamese networks. Specifically, it first adaptively learns an optimal combination of tasks for each instance from a given task pool, based on which a customized instance-level task weighing strategy is learned. Extensive experiments on 16 graph datasets across node-level and graph-level downstream tasks have demonstrated that by combining a few simple but classical tasks, WAS can achieve comparable performance to other leading counterparts. The code is available at https://github.com/TianyuFan0504/WAS.
Paper Structure (15 sections, 10 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 15 sections, 10 equations, 7 figures, 12 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) Performance ranking (1: best, 7: poorest) of seven pre-training tasks (rows) on eight datasets (columns). (b) Performance fluctuation on Bace (molecule dataset) when combining two tasks, AM and CP, with different task weight $\lambda$. (c) Performance gains or drops over that without pre-training when combining two tasks (diagonal represents only a single task) on Bace.
  • Figure 2: Overall workflow of WAS. Firstly, we train multiple teachers with different pre-training tasks. Secondly, we pass the teacher's representations to two modules (Selecting and Weighing) to get the selecting results $\kappa(\cdot,i)$ and initial weights $\omega(\cdot,i)$ for each instance $\mathcal{G}_i$. Finally, we weigh only those selected teachers to get weights $\lambda(\cdot,i)$ and distill the integrated distributions into the student.
  • Figure 3: A detailed comparison of three task selecting schemes on four datasets.
  • Figure 4: Evaluation on whether the performance of WAS can improve as the task pool expands.
  • Figure 5: Probability of being selected for different teachers (tasks) on different instances from Bace
  • ...and 2 more figures