MetaMolGen: A Neural Graph Motif Generation Model for De Novo Molecular Design
Zimo Yan, Jie Zhang, Zheng Xie, Chang Liu, Yizhen Liu, Yiping Song
TL;DR
MetaMolGen tackles data-scarce, property-conditioned molecular design by uniting first-order meta-learning (Reptile) with Conditional Neural Processes to learn task-aware molecular distributions from few examples. A learnable feature standardization layer stabilizes training and improves generalization, while a SMILES autoregressive decoder generates valid, diverse molecules conditioned on target properties via a lightweight property projector. Across ChEMBL, QM9, ZINC, and MOSES, MetaMolGen shows superior few-shot performance, strong conditional control, and fast generation, achieving high uniqueness and favorable property alignment with improved efficiency. The work advances data-efficient, controllable molecular design and provides theoretical guarantees on convergence and generalization, with practical impact for early-stage drug and materials discovery under limited data.
Abstract
Molecular generation plays an important role in drug discovery and materials science, especially in data-scarce scenarios where traditional generative models often struggle to achieve satisfactory conditional generalization. To address this challenge, we propose MetaMolGen, a first-order meta-learning-based molecular generator designed for few-shot and property-conditioned molecular generation. MetaMolGen standardizes the distribution of graph motifs by mapping them to a normalized latent space, and employs a lightweight autoregressive sequence model to generate SMILES sequences that faithfully reflect the underlying molecular structure. In addition, it supports conditional generation of molecules with target properties through a learnable property projector integrated into the generative process.Experimental results demonstrate that MetaMolGen consistently generates valid and diverse SMILES sequences under low-data regimes, outperforming conventional baselines. This highlights its advantage in fast adaptation and efficient conditional generation for practical molecular design.
