Table of Contents
Fetching ...

Scalable Multi-Task Transfer Learning for Molecular Property Prediction

Chanhui Lee, Dae-Woong Jeong, Sung Moon Ko, Sumin Lee, Hyunseung Kim, Soorin Yim, Sehui Han, Sungwoong Kim, Sungbin Lim

TL;DR

This work addresses the limitations of the manual design of transfer learning via data-driven bi-level optimization and enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios.

Abstract

Molecules have a number of distinct properties whose importance and application vary. Often, in reality, labels for some properties are hard to achieve despite their practical importance. A common solution to such data scarcity is to use models of good generalization with transfer learning. This involves domain experts for designing source and target tasks whose features are shared. However, this approach has limitations: i). Difficulty in accurate design of source-target task pairs due to the large number of tasks, and ii). corresponding computational burden verifying many trials and errors of transfer learning design, thereby iii). constraining the potential of foundation modeling of multi-task molecular property prediction. We address the limitations of the manual design of transfer learning via data-driven bi-level optimization. The proposed method enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios. Empirically, the proposed method improved the prediction performance of 40 molecular properties and accelerated training convergence.

Scalable Multi-Task Transfer Learning for Molecular Property Prediction

TL;DR

This work addresses the limitations of the manual design of transfer learning via data-driven bi-level optimization and enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios.

Abstract

Molecules have a number of distinct properties whose importance and application vary. Often, in reality, labels for some properties are hard to achieve despite their practical importance. A common solution to such data scarcity is to use models of good generalization with transfer learning. This involves domain experts for designing source and target tasks whose features are shared. However, this approach has limitations: i). Difficulty in accurate design of source-target task pairs due to the large number of tasks, and ii). corresponding computational burden verifying many trials and errors of transfer learning design, thereby iii). constraining the potential of foundation modeling of multi-task molecular property prediction. We address the limitations of the manual design of transfer learning via data-driven bi-level optimization. The proposed method enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios. Empirically, the proposed method improved the prediction performance of 40 molecular properties and accelerated training convergence.
Paper Structure (17 sections, 8 equations, 3 figures, 3 tables, 3 algorithms)

This paper contains 17 sections, 8 equations, 3 figures, 3 tables, 3 algorithms.

Figures (3)

  • Figure 1: Grid search of transfer ratios $\lambda$ between density (ds), heat of vaporization (hv), and boiling point (bp). Each axis $\lambda_{hv, bp}, \lambda_{ds, bp}, \lambda_{hv,ds}$ corresponds to the transfer ratio between (hv, bp), (ds, bp), (hv, ds), assuming $\lambda_{i \rightarrow j}=\lambda_{j \rightarrow i}$. The color of a point corresponds to the Root Mean Square Error (RMSE) of model prediction at the end of training with $\lambda_{hv, bp}, \lambda_{ds, bp}, \lambda_{hv, ds}$, and the best hyperparameter set is marked as a star.
  • Figure 2: Training overview of GATE. $t,s$ represents the target and source tasks for transfer learning. The colors of the arrows differentiate prediction paths: red corresponds to the path from $\text{Encoder}_s$, and blue corresponds to the path from $\text{Encoder}_t$.
  • Figure 3: Validation loss curve in learning 40 tasks molecular property regression, with and without the proposed methods.