Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Yi Xiao; Xiangxin Zhou; Qiang Liu; Liang Wang

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Yi Xiao, Xiangxin Zhou, Qiang Liu, Liang Wang

TL;DR

This paper presents the first systematic survey on multimodal frameworks for molecules research, beginning with the development of molecular deep learning and pointing out the necessity to involve textual modality.

Abstract

Artificial intelligence has demonstrated immense potential in scientific research. Within molecular science, it is revolutionizing the traditional computer-aided paradigm, ushering in a new era of deep learning. With recent progress in multimodal learning and natural language processing, an emerging trend has targeted at building multimodal frameworks to jointly model molecules with textual domain knowledge. In this paper, we present the first systematic survey on multimodal frameworks for molecules research. Specifically,we begin with the development of molecular deep learning and point out the necessity to involve textual modality. Next, we focus on recent advances in text-molecule alignment methods, categorizing current models into two groups based on their architectures and listing relevant pre-training tasks. Furthermore, we delves into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery. Finally, we discuss the limitations in this field and highlight several promising directions for future research.

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

TL;DR

Abstract

Paper Structure (41 sections, 5 equations, 1 figure, 1 table)

This paper contains 41 sections, 5 equations, 1 figure, 1 table.

Introduction
Molecular Descriptors and Encoding
Small Molecule Representation
1D Sequences
2D Graph
3D Geometry
Protein Representation
Protein Sequence
Protein Graph
Latent Space Alignment between Text and Molecule
Model Architecture
Single-Stream Architecture
Multi-Stream Architecture
Pre-training Tasks
Molecule-Text Contrastive Learning
...and 26 more sections

Figures (1)

Figure 1: Pipeline of multimodal framework for molecule and downstream molecular tasks (a-c). (a) Latent space alignment and adaptation of downstream tasks. The single-stream framework jointly models text and molecules with the same encoder. The downstream tasks are realized with task-specific prompts described in section \ref{['prompt']}; The multi-stream framework involves cross-modal alignment between text and molecules. Features from latent space can be directly used for tasks or be used in instruction-tuning. (b) Building a semi-autonomous agent for molecular research with instructions and in-context examples. (c) Building autonomous agent for chemistry with instructions and chain-of-thought prompting. Equipping agent with external tools and memory largely expand the autonomous level and capabilities.

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

TL;DR

Abstract

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Authors

TL;DR

Abstract

Table of Contents

Figures (1)