Few-shot Molecular Property Prediction: A Survey
Zeyu Wang, Tianyi Jiang, Huanchang Ma, Yao Lu, Xiaoze Bao, Shanqing Yu, Qi Xuan, Shirui Pan, Xin Zheng
TL;DR
Few-shot molecular property prediction (FSMPP) tackles predicting molecular properties under scarce annotations, a common bottleneck in drug discovery. The paper presents the first comprehensive survey, introducing a unified taxonomy across data-level, model-level, and learning-paradigm methods, and reviews representative approaches, datasets, and evaluation protocols. It identifies two core generalization challenges—cross-property distribution shifts and cross-molecule heterogeneity—and highlights current trends toward data- and model-centric strategies with emerging hybrid approaches. The analysis emphasizes the practical impact of FSMPP for rapid, resource-efficient molecular design and outlines opportunities in theory, multi-modal knowledge integration, scalability, and interpretability to guide future research and real-world pipelines.
Abstract
AI-assisted molecular property prediction has become a promising technique in early-stage drug discovery and materials design in recent years. However, due to high-cost and complex wet-lab experiments, real-world molecules usually experience the issue of scarce annotations, leading to limited labeled data for effective supervised AI model learning. In light of this, few-shot molecular property prediction (FSMPP) has emerged as an expressive paradigm that enables learning from only a few labeled examples. Despite rapidly growing attention, existing FSMPP studies remain fragmented, without a coherent framework to capture methodological advances and domain-specific challenges. In this work, we present the first comprehensive and systematic survey of few-shot molecular property prediction. We begin by analyzing the few-shot phenomenon in molecular datasets and highlighting two core challenges: (1) cross-property generalization under distribution shifts, where each task corresponding to each property, may follow a different data distribution or even be inherently weakly related to others from a biochemical perspective, requiring the model to transfer knowledge across heterogeneous prediction tasks, and (2) cross-molecule generalization under structural heterogeneity, where molecules involved in different or same properties may exhibit significant structural diversity, making model difficult to achieve generalization. Then, we introduce a unified taxonomy that organizes existing methods into data, model, and learning paradigm levels, reflecting their strategies for extracting knowledge from scarce supervision in few-shot molecular property prediction. Next, we compare representative methods, summarize benchmark datasets and evaluation protocols. In the end, we identify key trends and future directions for advancing the continued research on FSMPP.
