Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks
Zhe Zhao, Pengkun Wang, Xu Wang, Haibin Wen, Xiaolong Xie, Zhengyang Zhou, Qingfu Zhang, Yang Wang
TL;DR
This work addresses the forgetting problem observed when pre-training graph neural networks (GNNs) for downstream tasks, arguing that traditional pre-training compresses information in ways that can be detrimental to transfer. It introduces Delayed Bottlenecking Pre-training (DBP), a principled framework that preserves mutual information $I(\mathcal{D}^{pre}; Z)$ during pre-training by suppressing compression and then applies compression during fine-tuning guided by labeled downstream data, under two information-control objectives. The authors formulate tractable variational upper bounds for these objectives and provide theoretical results showing improved parameter transfer between pre-training and fine-tuning. Empirically, DBP demonstrates strong gains over state-of-the-art pre-training methods on chemistry and biology benchmarks, with analyses revealing favorable information dynamics and robustness across several GNN architectures. Overall, DBP offers a principled, generalizable approach to bridging pre-training and fine-tuning in graph representation learning with potential impact across domains.
Abstract
Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.
