Table of Contents
Fetching ...

Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches

Tanya Liyaqat, Tanvir Ahmad, Chandni Saxena

TL;DR

This survey maps the Molecular Property Prediction (MPP) landscape across single- and multimodal representations, detailing input forms (SMILES, graphs, images), encoding schemes, neural architectures (GNNs, RNNs, Transformers), and learning paradigms (transfer, contrastive, few-shot, multitask). It provides a modality-based taxonomy, assesses datasets (notably MoleculeNet) and tooling for feature generation, and compares state-of-the-art methods, noting graph- and SMILES-based approaches often lead performance while multimodal methods remain a promising yet less explored frontier. The discussion highlights core challenges—generalizability, data quality, interpretability, and multimodal fusion—and opportunities such as multitask learning, uncertainty quantification, and explainable AI to advance practical drug discovery and materials science applications.

Abstract

Molecular Property Prediction (MPP) plays a pivotal role across diverse domains, spanning drug discovery, material science, and environmental chemistry. Fueled by the exponential growth of chemical data and the evolution of artificial intelligence, recent years have witnessed remarkable strides in MPP. However, the multifaceted nature of molecular data, such as molecular structures, SMILES notation, and molecular images, continues to pose a fundamental challenge in its effective representation. To address this, representation learning techniques are instrumental as they acquire informative and interpretable representations of molecular data. This article explores recent AI/-based approaches in MPP, focusing on both single and multiple modality representation techniques. It provides an overview of various molecule representations and encoding schemes, categorizes MPP methods by their use of modalities, and outlines datasets and tools available for feature generation. The article also analyzes the performance of recent methods and suggests future research directions to advance the field of MPP.

Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches

TL;DR

This survey maps the Molecular Property Prediction (MPP) landscape across single- and multimodal representations, detailing input forms (SMILES, graphs, images), encoding schemes, neural architectures (GNNs, RNNs, Transformers), and learning paradigms (transfer, contrastive, few-shot, multitask). It provides a modality-based taxonomy, assesses datasets (notably MoleculeNet) and tooling for feature generation, and compares state-of-the-art methods, noting graph- and SMILES-based approaches often lead performance while multimodal methods remain a promising yet less explored frontier. The discussion highlights core challenges—generalizability, data quality, interpretability, and multimodal fusion—and opportunities such as multitask learning, uncertainty quantification, and explainable AI to advance practical drug discovery and materials science applications.

Abstract

Molecular Property Prediction (MPP) plays a pivotal role across diverse domains, spanning drug discovery, material science, and environmental chemistry. Fueled by the exponential growth of chemical data and the evolution of artificial intelligence, recent years have witnessed remarkable strides in MPP. However, the multifaceted nature of molecular data, such as molecular structures, SMILES notation, and molecular images, continues to pose a fundamental challenge in its effective representation. To address this, representation learning techniques are instrumental as they acquire informative and interpretable representations of molecular data. This article explores recent AI/-based approaches in MPP, focusing on both single and multiple modality representation techniques. It provides an overview of various molecule representations and encoding schemes, categorizes MPP methods by their use of modalities, and outlines datasets and tools available for feature generation. The article also analyzes the performance of recent methods and suggests future research directions to advance the field of MPP.
Paper Structure (43 sections, 2 equations, 11 figures, 9 tables)

This paper contains 43 sections, 2 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: The structure of the overall review.
  • Figure 2: Various input representations of molecules utilized in MPP
  • Figure 3: Encoding methods used for encoding SMILES, molecular graph and molecular images into a model processing format.
  • Figure 4: Modality based taxonomy of various molecular property areas including (a) Quantum chemistry (b) Physiological (c) Physical chemistry (d) Biophysics
  • Figure 5: Illustration of MPP using (a) Descriptor-based Neural Network, (b) SMILES string-based sequential model such as LSTM, and (c) Molecular structure-based Graph Neural Network (GNN). Each approach utilizes a different input representation to predict molecular properties, showcasing the versatility of computational methods in addressing diverse challenges in drug discovery and materials science.
  • ...and 6 more figures