Table of Contents
Fetching ...

MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts

Haoqiang Guo, Sendong Zhao, Haochun Wang, Yanrui Du, Bing Qin

TL;DR

MolTailor addresses the inefficiency of generic molecular representations for diverse downstream tasks by enabling task-specific tailoring via natural language prompts. It introduces MT-MTR, a molecule-text multimodal pretraining task, and a dual-tower Transformer architecture with a cross-attention–based MT-Encoder that treats the language model as an agent and the molecular model as a knowledge base. Empirical results on MoleculeNet show improved regression performance and competitive classification results, with evidence that task descriptions steer representations toward task-relevant properties. The work demonstrates the value of language-model guided optimization in leveraging existing molecular representations and suggests paths for extending molecular-text multimodal learning to real-world drug discovery problems.

Abstract

Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally important for a specific task. Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. The agent accentuates task-relevant features in the molecular representation by understanding the natural language description of the task, just as a tailor customizes clothes for clients. Thus, we call this approach MolTailor. Evaluations demonstrate MolTailor's superior performance over baselines, validating the efficacy of enhancing relevance for molecular representation learning. This illustrates the potential of language model guided optimization to better exploit and unleash the capabilities of existing powerful molecular representation methods. Our code is available at https://github.com/SCIR-HI/MolTailor.

MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts

TL;DR

MolTailor addresses the inefficiency of generic molecular representations for diverse downstream tasks by enabling task-specific tailoring via natural language prompts. It introduces MT-MTR, a molecule-text multimodal pretraining task, and a dual-tower Transformer architecture with a cross-attention–based MT-Encoder that treats the language model as an agent and the molecular model as a knowledge base. Empirical results on MoleculeNet show improved regression performance and competitive classification results, with evidence that task descriptions steer representations toward task-relevant properties. The work demonstrates the value of language-model guided optimization in leveraging existing molecular representations and suggests paths for extending molecular-text multimodal learning to real-world drug discovery problems.

Abstract

Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally important for a specific task. Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. The agent accentuates task-relevant features in the molecular representation by understanding the natural language description of the task, just as a tailor customizes clothes for clients. Thus, we call this approach MolTailor. Evaluations demonstrate MolTailor's superior performance over baselines, validating the efficacy of enhancing relevance for molecular representation learning. This illustrates the potential of language model guided optimization to better exploit and unleash the capabilities of existing powerful molecular representation methods. Our code is available at https://github.com/SCIR-HI/MolTailor.
Paper Structure (29 sections, 12 equations, 5 figures, 7 tables)

This paper contains 29 sections, 12 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Most existing molecular pretraining models (e.g. Grover, MolCLR) attempt to encode as much molecular information as possible (e.g. various functional groups and molecular weight) into a vector to obtain general molecular representation. However, for specific downstream tasks (e.g. Lipo, predicting lipophilicity of compounds), features are not equally important (e.g. Sulfonamide and Carboxylic acid groups significantly increase the hydrophilicity, being more critical than the remaining groups). By understanding task descriptions, MolTailor adjusts the weights of different features in the representation to obtain task-specific molecular representation.
  • Figure 2: Overview of the MolTailor framework. a) The construction process of the MT-MTR dataset. We obtain representative molecules from DrugBank wishart2018drugbank and ChEBI hastings2016chebi, and then use RDKit to calculate 209 properties for each molecule. For each molecule, we randomly sample 5-10 properties from the property set, use the property names to generate virtual task descriptions via GPT-3.5, and use the property values as regression labels. b) Model architecture of MolTailor. MolTailor consists of a language pretraining model (T-Encoder) and a molecular pretraining model (M-Encoder). The T-Encoder is divided into a unimodal part (for understanding task descriptions) and a multimodal part (for adjusting molecular representations). c) Internal structure of the Multimodal T-Encoder. It modifies the original Transformer Encoder Block to perform self-attention and cross-attention operations simultaneously: mapping the general molecular representation to obtain $K_m$ and $V_m$ vectors which are then concatenated with textual vectors $K_t$ and $V_t$. d) Pretraining task of MolTailor. The model needs to predict properties mentioned in the task description based on the molecule and text prompt. e) Downstream tasks of MolTailor. For a specific downstream task, we first generate the task description in the same format as pretraining via GPT-4 analysis, then take the SMILES and task description as input to predict labels for the corresponding task.
  • Figure 3: Performance of three methods on ESOL and molecular properties prediction tasks for Q5. The x-axis shows the task names, the y-axis shows the normalized RMSE, with lower values indicating better performance. Of the four molecular properties, the first three are related to the ESOL task, while the last one is opposite.
  • Figure 4: Visualization of the attention, answering Q6. The two molecular graphs on the left show MolTailor's attention over the molecules under different prompts. The text on the right shows which input tokens from the original prompt MolTailor pays most attention to.
  • Figure 5: Statistical results of regression labels in MT-MTR. We counted the occurrence of each property within the regression labels, as provided by RDKit. The horizontal axis of the graph represents the name of the property, while the vertical axis indicates the number of occurrences of that property.