Table of Contents
Fetching ...

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar

TL;DR

MoleculeSTM is trained, a foundation model that aligns the structure and text modalities through contrastive learning, and its utility on the downstream tasks of structure–text retrieval, text-guided editing and molecular property prediction is shown.

Abstract

There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

TL;DR

MoleculeSTM is trained, a foundation model that aligns the structure and text modalities through contrastive learning, and its utility on the downstream tasks of structure–text retrieval, text-guided editing and molecular property prediction is shown.

Abstract

There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
Paper Structure (32 sections, 8 equations, 8 figures, 26 tables)

This paper contains 32 sections, 8 equations, 8 figures, 26 tables.

Figures (8)

  • Figure 1: Pipeline of pretraining and downstream tasks. (a) MoleculeSTM pretraining with two branches, the chemical structure (green) and textual description (pink). (b) Structure-text retrieval downstream task. (c) Text-based molecule editing downstream task. (d) Molecular property prediction downstream task.
  • Figure 2: Results for zero-shot structure-text retrieval. (a) Accuracy for zero-shot structure-text retrieval on three DrugBank datasets. (b) Four case studies on DrugBank-ATC retrieval. HMG-CoA is $\beta$-Hydroxy $\beta$-methylglutaryl-CoA.
  • Figure 3: Pipelines for the zero-shot text-based molecule editing. (a) The space alignment step aligns the representation space of a pretrained molecule generation model and the representation space of MoleculeSTM. (b) The latent optimization step learns a latent representation that can be similar to both input molecules and textual descriptions.
  • Figure 4: Visualization results for the zero-shot text-based molecule editing. Satisfactory hit ratios (%) of four types text-based editing tasks: eight single-objective, four multi-objective, four ChEMBL binding-affinity-based editing tasks (pretrained random forest as an evaluator, and detailed text prompts are in Supplementary D), and four drug relevance editing tasks. The satisfactory threshold ($\Delta$) is 0 for all visualized results. Each task runs for three random seeds, and the length of each error bar represents the standard deviation.
  • Figure 5: Visual analysis on text-based molecule editing. Case studies for solubility editing (a,b), permeability editing (c,d), acceptor and donor editing (e,f), solubility and permeability editing (g,h), and neighborhood searching for patent data (i,j). The pink and blue regions mark the functional groups before and after the editing, and we list the chemical abstracts service (CAS) registry number. (k) visualizes binding-affinity-based editing, and the dashed red lines mark the potential bindings.
  • ...and 3 more figures