Table of Contents
Fetching ...

SynthFormer: Equivariant Pharmacophore-based Generation of Synthesizable Molecules for Ligand-Based Drug Design

Zygimantas Jocys, Zhanxing Zhu, Henriette M. G. Willems, Katayoun Farrahi

TL;DR

SynthFormer tackles the dual challenge of designing active ligands that are also synthetically feasible. It introduces a 3D equivariant pharmacophore encoder and a synthesis-aware Transformer-based decoder that constructs molecules as synthetic trees from building blocks. The approach yields molecules with strong docking performance, 100% synthesisability, and improved chemical diversity, enabling hit expansion and property optimization. By uniting 3D pharmacophore-informed design with realistic synthesis pathways, SynthFormer offers a practical workflow for rapid, synthesis-grounded drug design.

Abstract

Drug discovery is a complex, resource-intensive process requiring significant time and cost to bring new medicines to patients. Many generative models aim to accelerate drug discovery, but few produce synthetically accessible molecules. Conversely, synthesis-focused models do not leverage the 3D information crucial for effective drug design. We introduce SynthFormer, a novel machine learning model that generates fully synthesizable molecules, structured as synthetic trees, by introducing both 3D information and pharmacophores as input. SynthFormer features a 3D equivariant graph neural network to encode pharmacophores, followed by a Transformer-based synthesis-aware decoding mechanism for constructing synthetic trees as a sequence of tokens. It is a first-of-its-kind approach that could provide capabilities for designing active molecules based on pharmacophores, exploring the local synthesizable chemical space around hit molecules and optimizing their properties. We demonstrate its effectiveness through various challenging tasks, including designing active compounds for a range of proteins, performing hit expansion and optimizing molecular properties.

SynthFormer: Equivariant Pharmacophore-based Generation of Synthesizable Molecules for Ligand-Based Drug Design

TL;DR

SynthFormer tackles the dual challenge of designing active ligands that are also synthetically feasible. It introduces a 3D equivariant pharmacophore encoder and a synthesis-aware Transformer-based decoder that constructs molecules as synthetic trees from building blocks. The approach yields molecules with strong docking performance, 100% synthesisability, and improved chemical diversity, enabling hit expansion and property optimization. By uniting 3D pharmacophore-informed design with realistic synthesis pathways, SynthFormer offers a practical workflow for rapid, synthesis-grounded drug design.

Abstract

Drug discovery is a complex, resource-intensive process requiring significant time and cost to bring new medicines to patients. Many generative models aim to accelerate drug discovery, but few produce synthetically accessible molecules. Conversely, synthesis-focused models do not leverage the 3D information crucial for effective drug design. We introduce SynthFormer, a novel machine learning model that generates fully synthesizable molecules, structured as synthetic trees, by introducing both 3D information and pharmacophores as input. SynthFormer features a 3D equivariant graph neural network to encode pharmacophores, followed by a Transformer-based synthesis-aware decoding mechanism for constructing synthetic trees as a sequence of tokens. It is a first-of-its-kind approach that could provide capabilities for designing active molecules based on pharmacophores, exploring the local synthesizable chemical space around hit molecules and optimizing their properties. We demonstrate its effectiveness through various challenging tasks, including designing active compounds for a range of proteins, performing hit expansion and optimizing molecular properties.
Paper Structure (23 sections, 12 equations, 4 figures, 3 tables)

This paper contains 23 sections, 12 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The generation process begins by representing the pharmacophores as a fully connected graph, passed through an EGNN to obtain pharmacophore embeddings. These embeddings are then propagated to a transformer decoder. The transformer decoder first receives a start token and predicts building block $B_1$, followed by reaction $R_0$. Next, it takes the fingerprint of $B_1$ as input to the decoder and predicts building block $B_2$, followed by reaction $R_1$, which is applied to generate product $P_1$. $P_1$ is then used to predict building block $B_3$ and reaction $R_2$, producing product $P_3$. This process repeats until the end token is generated.
  • Figure 2: The building block encoding of the query molecule serves as the reference, with the three closest molecules identified using cosine similarity, preserving significant structural similarity.
  • Figure 3: The first molecule corresponds to the reference structure derived from the 3COY PDB ID. The subsequent molecules are computationally generated using the SynthFormer model, illustrating its capability to design novel molecular structures based on a known protein-ligand complex. These results highlight SynthFormer's potential in generating diverse and plausible molecular candidates.
  • Figure 4: The first molecule corresponds to the reference structure derived from the 3COY PDB ID. The subsequent molecules are expanded hits generated by the SynthFormer model, demonstrating structural diversity and novel chemical scaffolds.