Table of Contents
Fetching ...

Enhancing Molecular Property Prediction with Auxiliary Learning and Task-Specific Adaptation

Vishal Dey, Xia Ning

TL;DR

Pretrained molecular Graph Neural Networks often underperform when fine-tuned across diverse downstream property tasks due to negative transfer from unrelated self-supervised signals. The authors propose auxiliary-learning-based adaptation with Rotation of Conflicting Gradients (RCGrad) and Bi-level Optimization with Gradient Rotation (BLO+RCGrad) to jointly train pretrained GNNs with multiple auxiliary tasks while aligning task gradients. They introduce gradient-similarity and gradient-scaling mechanisms, and demonstrate that RCGrad and BLO+RCGrad mitigate negative transfer and yield up to 7.7% higher ROC-AUC than plain fine-tuning across MoleculeNet tasks, particularly in low-data regimes. The study shows these gradient-rotation techniques improve generalization across different pretrained GNNs (e.g., Sup-CP and Sup) and set a new direction for robust knowledge transfer in molecular property prediction. The results imply improved efficiency for molecular screening and drug discovery by leveraging rich SSL signals without sacrificing target-task performance.

Abstract

Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple auxiliary tasks. This could enable the GNNs to learn both general and task-specific features, which may benefit the target task. However, a major challenge is to determine the relatedness of auxiliary tasks with the target task. To address this, we investigate multiple strategies to measure the relevance of auxiliary tasks and integrate such tasks by adaptively combining task gradients or by learning task weights via bi-level optimization. Additionally, we propose a novel gradient surgery-based approach, Rotation of Conflicting Gradients ($\mathtt{RCGrad}$), that learns to align conflicting auxiliary task gradients through rotation. Our experiments with state-of-the-art pretrained GNNs demonstrate the efficacy of our proposed methods, with improvements of up to 7.7% over fine-tuning. This suggests that incorporating auxiliary tasks along with target task fine-tuning can be an effective way to improve the generalizability of pretrained GNNs for molecular property prediction.

Enhancing Molecular Property Prediction with Auxiliary Learning and Task-Specific Adaptation

TL;DR

Pretrained molecular Graph Neural Networks often underperform when fine-tuned across diverse downstream property tasks due to negative transfer from unrelated self-supervised signals. The authors propose auxiliary-learning-based adaptation with Rotation of Conflicting Gradients (RCGrad) and Bi-level Optimization with Gradient Rotation (BLO+RCGrad) to jointly train pretrained GNNs with multiple auxiliary tasks while aligning task gradients. They introduce gradient-similarity and gradient-scaling mechanisms, and demonstrate that RCGrad and BLO+RCGrad mitigate negative transfer and yield up to 7.7% higher ROC-AUC than plain fine-tuning across MoleculeNet tasks, particularly in low-data regimes. The study shows these gradient-rotation techniques improve generalization across different pretrained GNNs (e.g., Sup-CP and Sup) and set a new direction for robust knowledge transfer in molecular property prediction. The results imply improved efficiency for molecular screening and drug discovery by leveraging rich SSL signals without sacrificing target-task performance.

Abstract

Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple auxiliary tasks. This could enable the GNNs to learn both general and task-specific features, which may benefit the target task. However, a major challenge is to determine the relatedness of auxiliary tasks with the target task. To address this, we investigate multiple strategies to measure the relevance of auxiliary tasks and integrate such tasks by adaptively combining task gradients or by learning task weights via bi-level optimization. Additionally, we propose a novel gradient surgery-based approach, Rotation of Conflicting Gradients (), that learns to align conflicting auxiliary task gradients through rotation. Our experiments with state-of-the-art pretrained GNNs demonstrate the efficacy of our proposed methods, with improvements of up to 7.7% over fine-tuning. This suggests that incorporating auxiliary tasks along with target task fine-tuning can be an effective way to improve the generalizability of pretrained GNNs for molecular property prediction.
Paper Structure (23 sections, 8 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 23 sections, 8 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Off-the-shelf available pretrained GNNs are transferred for target task-specific adaptation.
  • Figure 2: Large variations of scales among task gradients are observed when $\mathop{\mathtt{Sup\text{-}CP}}\limits$ is adapted with all auxiliary tasks using $\mathop{\mathtt{MTL}}\limits$.
  • Figure 3: (a) $\mathop{\mathtt{PCGrad}}\limits$ projects conflicting gradient $\mathop{\mathbf{g}_{a,i}}\limits$ onto the normal plane of $\mathop{\mathbf{g}_{t}}\limits$. (b) $\mathop{\mathtt{RCGrad}}\limits$ applies a rotation to $\mathop{\mathbf{g}_{a,i}}\limits$, followed by projection. (c) Rotation followed by orthogonal projection is equivalent to scaling $\mathop{\mathbf{g}^p_{a,i}}\limits$. (d) If the rotated gradient does not conflict with $\mathop{\mathbf{g}_{t}}\limits$, the projection of the rotated gradient onto $\mathop{\mathbf{g}_{t}}\limits$ is incorporated as scaling $\mathop{\mathbf{g}_{t}}\limits$ by $(1+\hbox{$\mathbf{s}$}\xspace_t)$.
  • Figure 4: Target task gradient conflicts with EP and CP tasks. $\mathop{\mathtt{Sup\text{-}CP}}\limits$ is adapted with all auxiliary tasks in a $\mathop{\mathtt{MTL}}\limits$ setting.
  • Figure 5: Large variations of scales among task gradients observed across multiple tasks.
  • ...and 2 more figures