Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

Taojie Kuang; Pengfei Liu; Zhixiang Ren

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

Taojie Kuang, Pengfei Liu, Zhixiang Ren

TL;DR

This paper surveys how domain knowledge and multi-modality influence deep-learning molecular property prediction (MPP) across input types, architectures, and training strategies. By synthesizing results from MoleculeNet and related benchmarks, it shows that integrating domain knowledge yields RMSE improvements up to $4.0\%$ and ROC-AUC gains up to $1.7\%$, while multi-modal fusion—adding 1D SMILES or 3D data to 2D graphs—boosts RMSE/regression and ROC-AUC/classification by up to $9.1\%$ and $13.2\%$, respectively. The authors discuss encoder designs (RNN, GNN, Transformer, CNN) and training regimes (self-supervised, semi-supervised, transfer learning, multi-task) and provide practical guidance for future MPP development. These findings offer actionable strategies to accelerate drug discovery by leveraging richer, domain-informed representations.

Abstract

The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learning-based methods has shown remarkable potential in enhancing molecular property prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves molecular property prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in root mean square error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. We also discover that enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%, and augmenting 2D graphs with 3D information increases performance for classification tasks by up to 13.2%, with both enhancements measured using ROC-AUC. The two consolidated insights offer crucial guidance for future advancements in drug discovery.

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

TL;DR

and ROC-AUC gains up to

, while multi-modal fusion—adding 1D SMILES or 3D data to 2D graphs—boosts RMSE/regression and ROC-AUC/classification by up to

and

, respectively. The authors discuss encoder designs (RNN, GNN, Transformer, CNN) and training regimes (self-supervised, semi-supervised, transfer learning, multi-task) and provide practical guidance for future MPP development. These findings offer actionable strategies to accelerate drug discovery by leveraging richer, domain-informed representations.

Abstract

Paper Structure (27 sections, 9 figures, 4 tables)

This paper contains 27 sections, 9 figures, 4 tables.

Introduction
Molecular Modality
Sequence-based Data
Graph-based Data
Pixel-based Data
Domain Knowledge
Atom-bond Property
Molecular Substructure
Chemical Reaction
Molecular Property
Modeling Method
Encoder
RNN-based
GNN-based
Transformer-based
...and 12 more sections

Figures (9)

Figure 1: The overview of our survey. we review the impact of domain knowledge and multi-modality on molecular property prediction from three critical aspects: input data, model architectures, and training strategy. The detailed information are explained in the following sections.
Figure 2: Molecular modality: We illustrate the transformation of molecular modality crucial for MPP using the example of the caffeine molecule. This is demonstrated across three primary categories: sequence-based, graph-based, and pixel-based formats. Each format is derived from the SMILES representation of caffeine, using Python packages such as RDKit and software tools like PyMol. a). The sequence-based data section includes formats like SMILES and its variants (Canonical and Isomeric SMILES), molecular fingerprints (ECFP, Morgan, MACCS), and SELFIES, highlighting their roles in encoding molecular structures. b). Graph-based data represents caffeine as a graph with atoms as nodes and bonds as edges, enriched with 3D information for detailed structural insights. c). Pixel-based data showcases 2D images and 3D grids of caffeine, crucial for visual analysis and spatial interpretation.
Figure 3: Molecular domain knowledge: This figure categorizes molecular expert knowledge essential for MPP into four domains, using the molecule C=(CC(=C)c1cc(C(=O)O)cnc1C(C)CC as an example. a). In the atom-bond property section, we examine aspects such as the molecule’s atomic number, mass, valence, and bond types. b). Molecular substructure includes the functional groups, molecular fragments, and pharmacophores of this molecule, illustrating their influence on its chemical behavior and interactions. c). Molecular property covers a range of properties from quantum mechanics to physiology, showcasing how these properties affect the molecule's behavior in drug development. d). Chemical reaction discusses the mechanisms of molecular transformations, highlighting the molecule's reactivity.
Figure 4: The molecular encoder method summary. We categorize molecular encoder method into five types: RNN-based, GNN-based, Transformer-based, CNN-based, and Multi-Modality-based. For each category, key techniques and notable advancements utilized in various influential studies are highlighted, showcasing the evolution and diversification of approaches in molecular encoding.
Figure 5: Molecular encoder architectures: This figure categorizes molecular encoder architectures for single modality into four types: RNN-based, Transformer-based, GNN-based, and CNN-based. Each type is assessed for its ability to capture information about functional groups like carboxyl groups (-COOH) in the molecule C=(CC(=C)c1cc(C(=O)O)cnc1C(C)CC. a). RNN-based encoders process sequence data, maintaining a memory of previous inputs to effectively capture sequential patterns of (-COOH). b). Transformer-based models utilize Self-Attention mechanisms, enabling them to identify and focus on the (-COOH) group's specific interactions within the molecular sequence. c). GNN-based architectures employ a message passing strategy, extracting the topological information of (-COOH) within the molecule's graph structure. d). CNN-based models analyze spatial patterns through convolution layers, identifying sub-images that contain the (-COOH) group. This visualization highlights how each encoder type uniquely processes and interprets the molecular structure for MPP.
...and 4 more figures

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

TL;DR

Abstract

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (9)