Task Addition in Multi-Task Learning by Geometrical Alignment

Soorin Yim; Dae-Woong Jeong; Sung Moon Ko; Sumin Lee; Hyunseung Kim; Chanhui Lee; Sehui Han

Task Addition in Multi-Task Learning by Geometrical Alignment

Soorin Yim, Dae-Woong Jeong, Sung Moon Ko, Sumin Lee, Hyunseung Kim, Chanhui Lee, Sehui Han

TL;DR

A task addition approach is proposed for GATE to improve performance on target tasks with limited data while minimizing computational complexity, and is achieved through supervised multi-task pre-training on a large dataset, followed by the addition and training of task-specific modules for each target task.

Abstract

Training deep learning models on limited data while maintaining generalization is one of the fundamental challenges in molecular property prediction. One effective solution is transferring knowledge extracted from abundant datasets to those with scarce data. Recently, a novel algorithm called Geometrically Aligned Transfer Encoder (GATE) has been introduced, which uses soft parameter sharing by aligning the geometrical shapes of task-specific latent spaces. However, GATE faces limitations in scaling to multiple tasks due to computational costs. In this study, we propose a task addition approach for GATE to improve performance on target tasks with limited data while minimizing computational complexity. It is achieved through supervised multi-task pre-training on a large dataset, followed by the addition and training of task-specific modules for each target task. Our experiments demonstrate the superior performance of the task addition strategy for GATE over conventional multi-task methods, with comparable computational costs.

Task Addition in Multi-Task Learning by Geometrical Alignment

TL;DR

Abstract

Paper Structure (14 sections, 14 equations, 5 figures, 6 tables)

This paper contains 14 sections, 14 equations, 5 figures, 6 tables.

Introduction
Task Addition by Geometrical Alignment
Experiments
Experimental Setup
Datasets
Models
Results
Task Addition Reduces Computational Costs
Task-Added GATE Enables Knowledge Transfer
Dependence on Source Tasks
Discussion
Detailed Explanation of Datasets
Architecture and Hyperparameters
Experimental Results

Figures (5)

Figure 1: Schematic diagram of task addition in multi-task learning, where a model is pre-trained on two source tasks and one target task is added. (left) In conventional multi-task learning, each task shares a common latent vector and uses a task-specific head for making predictions. During pre-training, embedding and heads for each source task are trained. Subsequently, heads for target tasks are added and trained with target data while modules trained in the pre-training stage are kept frozen. (middle) Task addition for GATE algorithm, which comprises embedding and task-specific modules called regression units (RU). During pre-training, embedding and RUs for source tasks are trained. Then, RUs for target tasks are added and trained with target data, while modules trained in the pre-training stage are kept frozen. (right) In GATE, the regression unit for task n consists of four modules: encoder, transfer ($\text{T n}$), inverse transfer ($\text{T}^{-1} \text{n}$), and head. In GATE, latent vectors for heads are not directly shared; instead, they are transferred to a universal locally flat (LF) space, enabling knowledge transfer through the alignment of geometrical shapes of source and target latent spaces.
Figure 2: Conceptual depiction of the GATE algorithm for task addition where the model is pre-trained on two source tasks and one target task is added. Knowledge from source tasks is transferred to a target task by aligning the geometry of the target task to the geometries of source tasks. This alignment is achieved by finding a transfer function, $\phi$, which maps an arbitrary point from a task-specific coordinate to a universal manifold, $M$. One can transform an arbitrary point in the overlapping region from one task coordinate to another by composing transfer functions. By matching the overlapping points on a manifold, one can align the inherent geometry of target data to the geometry of source data. This allows the information to flow from the source to target tasks. Grey-scaled regions are trained in the pre-training stage and frozen in the addition stage.
Figure 3: The performance of task-added MTL, SINGLE, and task-added GATE algorithm. Average RMSE and Pearson correlation values are displayed across 10 target tasks, with error bars indicating standard deviation. Detailed performance values can be found in Appendix Table \ref{['performance']}.
Figure 4: Correlation recovery rate of MTL and GATE algorithm. The correlation of task-added GATE and MTL is divided by the correlation of vanilla20 GATE and MTL, respectively. The complete names of the abbreviated tasks are listed in Appendix Table \ref{['datasets']}.
Figure 5: The relationship between correlation to source tasks and task addition performance. The performance of task-added MTL decreases as the maximum absolute correlation to the source tasks decreases, whereas the performance of task-added GATE is less dependent on the correlation to the source tasks. The complete names of the abbreviated tasks are provided in Appendix Table \ref{['datasets']}.

Task Addition in Multi-Task Learning by Geometrical Alignment

TL;DR

Abstract

Task Addition in Multi-Task Learning by Geometrical Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (5)