Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction
Linjia Kang, Songhua Zhou, Shuyan Fang, Shichao Liu
TL;DR
Multi-label molecular property prediction faces an exponential output space $2^m$ and gradient conflicts among tasks. The authors introduce HiPM, a hierarchical prompted molecular representation learning framework with a Molecular Representation Encoder (MRE) and a Task-Aware Prompter (TAP) to model multi-granular task correlations. HiPM achieves state-of-the-art performance on six MoleculeNet datasets, particularly when label correlations are strong, and provides interpretability through affinity-guided task clustering and motif-weight analysis. This work mitigates negative transfer in multi-label settings and offers a scalable tool for drug discovery applications.
Abstract
Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for hierarchical prompted molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
