Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches
Tanya Liyaqat, Tanvir Ahmad, Chandni Saxena
TL;DR
This survey maps the Molecular Property Prediction (MPP) landscape across single- and multimodal representations, detailing input forms (SMILES, graphs, images), encoding schemes, neural architectures (GNNs, RNNs, Transformers), and learning paradigms (transfer, contrastive, few-shot, multitask). It provides a modality-based taxonomy, assesses datasets (notably MoleculeNet) and tooling for feature generation, and compares state-of-the-art methods, noting graph- and SMILES-based approaches often lead performance while multimodal methods remain a promising yet less explored frontier. The discussion highlights core challenges—generalizability, data quality, interpretability, and multimodal fusion—and opportunities such as multitask learning, uncertainty quantification, and explainable AI to advance practical drug discovery and materials science applications.
Abstract
Molecular Property Prediction (MPP) plays a pivotal role across diverse domains, spanning drug discovery, material science, and environmental chemistry. Fueled by the exponential growth of chemical data and the evolution of artificial intelligence, recent years have witnessed remarkable strides in MPP. However, the multifaceted nature of molecular data, such as molecular structures, SMILES notation, and molecular images, continues to pose a fundamental challenge in its effective representation. To address this, representation learning techniques are instrumental as they acquire informative and interpretable representations of molecular data. This article explores recent AI/-based approaches in MPP, focusing on both single and multiple modality representation techniques. It provides an overview of various molecule representations and encoding schemes, categorizes MPP methods by their use of modalities, and outlines datasets and tools available for feature generation. The article also analyzes the performance of recent methods and suggests future research directions to advance the field of MPP.
