Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors
Hyeonah Kim, Minsu Kim, Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, Yoshua Bengio, Alex Hernandez-Garcia
TL;DR
The paper tackles the challenge of designing synthesizable de novo molecules by introducing S3-GFN, a soft-constrained GFlowNet that leverages a pretrained SMILES prior to steer generation toward synthesizable chemical spaces. It decouples synthesizability constraints from rewards via a distributional regularization implemented through two replay buffers and a contrastive auxiliary loss, enabling flexible, off-policy learning. Empirical results show synthesizability rates exceeding 95% and improved task rewards across both target-fold and structure-based drug discovery tasks, often outperforming reaction-based GFlowNets. The approach also demonstrates rapid realignment under changing constraints and robustness in sample-limited settings, underscoring its practical potential for scalable, feasible molecular generation. Overall, S3-GFN provides a flexible, scalable pathway to integrate rich chemical priors with synthesizability constraints in sequence-based molecular generation.
Abstract
The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.
