Multitask Extension of Geometrically Aligned Transfer Encoder
Sung Moon Ko, Sumin Lee, Dae-Woong Jeong, Hyunseung Kim, Chanhui Lee, Soorin Yim, Sehui Han
TL;DR
The paper addresses data scarcity in molecular property prediction by transferring information across multiple tasks. It generalizes Geometrically Aligned Transfer Encoder (GATE) to a many-task setting by aligning latent-space geometries via mappings to a locally flat universal manifold derived from SMILES, enabling mutual information flow across tasks. The method introduces per-task regression units, autoencoder-based task-to-manifold mappings, and a composite loss $l_{tot}$ that combines $l_{reg}$, $l_{auto}$, $l_{cons}$, $l_{map}$, and $l_{dis}$ to enforce local and global geometric alignment. Empirically, it yields improved or competitive performance across 10 molecular-property datasets, with clear synergy in multi-task settings and robust behavior relative to standard multitask learning, albeit with increased computational complexity and opportunities for global-geometry-based enhancements.
Abstract
Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup. Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data.
