SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Miruna Cretu, Charles Harris, Ilia Igashov, Arne Schneuing, Marwin Segler, Bruno Correia, Julien Roy, Emmanuel Bengio, Pietro Liò
TL;DR
SynFlowNet tackles the synthetic accessibility gap in de novo molecular design by integrating a reaction-based action space into a GFlowNet, forcing generation to follow synthesizable pathways. The framework combines a forward policy with a parameterized backward policy and employs masking and fingerprint-based scaling to handle large reaction and building-block spaces, achieving diverse, synthesizable outputs better than RL baselines. Key contributions include a novel backward-policy training regime to maintain MDP-consistent backward trajectories, scaling strategies for large BB libraries, and demonstration that target-specific fragment information can further guide synthesis. The approach promises practical impact for drug discovery by bridging in silico design with real-world synthesis and retrosynthesis planning, while remaining adaptable to multiple targets and space sizes.
Abstract
Generative models see increasing use in computer-aided drug design. However, while performing well at capturing distributions of molecular motifs, they often produce synthetically inaccessible molecules. To address this, we introduce SynFlowNet, a GFlowNet model whose action space uses chemical reactions and purchasable reactants to sequentially build new molecules. By incorporating forward synthesis as an explicit constraint of the generative mechanism, we aim at bridging the gap between in silico molecular generation and real world synthesis capabilities. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool to assess the synthesizability of our compounds, and motivate the choice of GFlowNets through considerable improvement in sample diversity compared to baselines. Additionally, we identify challenges with reaction encodings that can complicate traversal of the MDP in the backward direction. To address this, we introduce various strategies for learning the GFlowNet backward policy and thus demonstrate how additional constraints can be integrated into the GFlowNet MDP framework. This approach enables our model to successfully identify synthesis pathways for previously unseen molecules.
