Table of Contents
Fetching ...

A Genetic Algorithm for Navigating Synthesizable Molecular Spaces

Alston Lo, Connor W. Coley, Wojciech Matusik

TL;DR

The method features custom crossover and mutation operators that explicitly constrain it to synthesizable molecular space, and is designed to serve as a strong standalone baseline but also as a versatile module that can be incorporated into larger synthesis-aware workflows in the future.

Abstract

Inspired by the effectiveness of genetic algorithms and the importance of synthesizability in molecular design, we present SynGA, a simple genetic algorithm that operates directly over synthesis routes. Our method features custom crossover and mutation operators that explicitly constrain it to synthesizable molecular space. By modifying the fitness function, we demonstrate the effectiveness of SynGA on a variety of design tasks, including synthesizable analog search and sample-efficient property optimization, for both 2D and 3D objectives. Furthermore, by coupling SynGA with a machine learning-based filter that focuses the building block set, we boost SynGA to state-of-the-art performance. For property optimization, this manifests as a model-based variant SynGBO, which employs SynGA and block filtering in the inner loop of Bayesian optimization. Since SynGA is lightweight and enforces synthesizability by construction, our hope is that SynGA can not only serve as a strong standalone baseline but also as a versatile module that can be incorporated into larger synthesis-aware workflows in the future.

A Genetic Algorithm for Navigating Synthesizable Molecular Spaces

TL;DR

The method features custom crossover and mutation operators that explicitly constrain it to synthesizable molecular space, and is designed to serve as a strong standalone baseline but also as a versatile module that can be incorporated into larger synthesis-aware workflows in the future.

Abstract

Inspired by the effectiveness of genetic algorithms and the importance of synthesizability in molecular design, we present SynGA, a simple genetic algorithm that operates directly over synthesis routes. Our method features custom crossover and mutation operators that explicitly constrain it to synthesizable molecular space. By modifying the fitness function, we demonstrate the effectiveness of SynGA on a variety of design tasks, including synthesizable analog search and sample-efficient property optimization, for both 2D and 3D objectives. Furthermore, by coupling SynGA with a machine learning-based filter that focuses the building block set, we boost SynGA to state-of-the-art performance. For property optimization, this manifests as a model-based variant SynGBO, which employs SynGA and block filtering in the inner loop of Bayesian optimization. Since SynGA is lightweight and enforces synthesizability by construction, our hope is that SynGA can not only serve as a strong standalone baseline but also as a versatile module that can be incorporated into larger synthesis-aware workflows in the future.

Paper Structure

This paper contains 38 sections, 6 equations, 4 figures, 18 tables, 2 algorithms.

Figures (4)

  • Figure 1: A graphical overview of SynGA, which operates over synthesis trees built from building blocks (squares) and reaction templates (circles). Example blocks and a reaction are drawn above using SmilesDrawer smilesDrawer.
  • Figure 2: The mean score for the top-$k$ molecules plotted over the number of oracle calls consumed, and averaged over tasks. We plot the mean over 5 seeds.
  • Figure 3: An example ligand proposed by SynGA for each receptor.
  • Figure 4: An example ligand proposed by SynGBO for each receptor.