Graph-structured Small Molecule Drug Discovery Through Deep Learning: Progress, Challenges, and Opportunities
Kun Li, Yida Xiong, Hongzhi Zhang, Xiantao Cai, Jia Wu, Bo Du, Wenbin Hu
TL;DR
This paper surveys graph-structured deep learning in small molecule drug discovery, organizing six core tasks (DTI/DTA, DRP, DDI, MPP, MG, MO) and presenting a unified graph-based problem formulation with $Y_{MPP}=f_model(M)$ and $Y_{DX}=f_model(M,X)$, as well as a molecular generation/optimization objective $M_Gen/Opt=arg_min_M f_val(emptyset/M,emptyset/C)$. It reviews representative DL techniques (e.g., GNNs, knowledge graphs, diffusion models, self-supervised pretraining) and key datasets (Davis, KIBA, GDSCv1/2, CCLE, OC20, MoleculeNet, OGB), while highlighting six interacting tasks and the challenges they face. The paper also discusses critical challenges—interpretability, out-of-distribution generalization, the training-to-lab validation gap, and fair benchmarking—and proposes directions like data augmentation, domain adaptation, online learning, and standardized benchmarks to advance practical drug discovery. Overall, it aims to guide researchers and practitioners toward more efficient screening, generation, and optimization of small molecules with desirable properties through graph-based DL.
Abstract
Due to their excellent drug-like and pharmacokinetic properties, small molecule drugs are widely used to treat various diseases, making them a critical component of drug discovery. In recent years, with the rapid development of deep learning (DL) techniques, DL-based small molecule drug discovery methods have achieved excellent performance in prediction accuracy, speed, and complex molecular relationship modeling compared to traditional machine learning approaches. These advancements enhance drug screening efficiency and optimization and provide more precise and effective solutions for various drug discovery tasks. Contributing to this field's development, this paper aims to systematically summarize and generalize the recent key tasks and representative techniques in graph-structured small molecule drug discovery in recent years. Specifically, we provide an overview of the major tasks in small molecule drug discovery and their interrelationships. Next, we analyze the six core tasks, summarizing the related methods, commonly used datasets, and technological development trends. Finally, we discuss key challenges, such as interpretability and out-of-distribution generalization, and offer our insights into future research directions for small molecule drug discovery.
