Procedural Synthesis of Synthesizable Molecules
Michael Sun, Alston Lo, Minghao Guo, Jie Chen, Connor Coley, Wojciech Matusik
TL;DR
This work redefines synthesizable molecular design and analog generation as conditional program-synthesis problems and introduces a bi-level optimization framework that decouples syntactic skeletons from chemical semantics. An outer Metropolis-Hastings search over skeletons and an inner horizon-aware decoding loop, implemented with graph neural policies, enable efficient exploration of synthesis pathways within a fixed grammar, further complemented by a GA for multi-objective design. Across analog generation and molecule design tasks, the approach yields higher reconstructive accuracy, greater diversity, and improved synthetic accessibility, with strong docking performance and notable sample-efficiency gains. The results demonstrate that explicit control over synthesis resources and templates can significantly accelerate discovery workflows and is well-suited for integration with autonomous synthesis platforms.
Abstract
Designing synthetically accessible molecules and recommending analogs to unsynthesizable molecules are important problems for accelerating molecular discovery. We reconceptualize both problems using ideas from program synthesis. Drawing inspiration from syntax-guided synthesis approaches, we decouple the syntactic skeleton from the semantics of a synthetic tree to create a bilevel framework for reasoning about the combinatorial space of synthesis pathways. Given a molecule we aim to generate analogs for, we iteratively refine its skeletal characteristics via Markov Chain Monte Carlo simulations over the space of syntactic skeletons. Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. Our key insight is that once the syntactic skeleton is set, we can amortize over the search complexity of deriving the program's semantics by training policies to fully utilize the fixed horizon Markov Decision Process imposed by the syntactic template. We demonstrate performance advantages of our bilevel framework for synthesizable analog generation and synthesizable molecule design. Notably, our approach offers the user explicit control over the resources required to perform synthesis and biases the design space towards simpler solutions, making it particularly promising for autonomous synthesis platforms. Code is at https://github.com/shiningsunnyday/SynthesisNet.
