Table of Contents
Fetching ...

TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

Tony Shen, Seonghwan Seo, Grayson Lee, Mohit Pandey, Jason R Smith, Artem Cherkasov, Woo Youn Kim, Martin Ester

TL;DR

This work reframes structure-based drug design as learning a reward distribution conditioned on protein pocket structure, using TacoGFN—a pocket-conditioned GFlowNet that generates 2D fragment-based ligands with rewards from predicted affinity, drug-likeness, and synthesizability. By modeling $oldsymbol{\,\pi}(L|P,\beta) \propto R(L|P)^eta$ and conditioning on pocket structure via a GVP-GNN encoder, TacoGFN achieves state-of-the-art results on CrossDocked2020, with a 56.0% generative success rate and a median Vina Dock of $-8.44$ kcal/mol, further improved to $-10.93$ with fine-tuning to reach 88.8% success. A pharmacophore-based docking predictor enables fast, generalizable affinity evaluation, and large-scale ablations show benefits from pocket conditioning and larger docking datasets. The approach significantly accelerates exploration of the chemical space while producing drug-like, synthesizable molecules tailored to unseen pockets, with TacoGFN+FT setting new benchmarks among optimization-based baselines as well.

Abstract

Searching the vast chemical space for drug-like molecules that bind with a protein pocket is a challenging task in drug discovery. Recently, structure-based generative models have been introduced which promise to be more efficient by learning to generate molecules for any given protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, structure-based methods do not yet outperform optimization-based methods that generate binding molecules for just one pocket. To overcome limitations on data while leveraging learning across protein targets, we choose to model the reward distribution conditioned on pocket structure, instead of the training data distribution. We design TacoGFN, a novel GFlowNet-based approach for structure-based drug design, which can generate molecules conditioned on any protein pocket structure with probabilities proportional to its affinity and property rewards. In the generative setting for CrossDocked2020 benchmark, TacoGFN attains a state-of-the-art success rate of $56.0\%$ and $-8.44$ kcal/mol in median Vina Dock score while improving the generation time by multiple orders of magnitude. Fine-tuning TacoGFN further improves the median Vina Dock score to $-10.93$ kcal/mol and the success rate to $88.8\%$, outperforming all optimization-based methods.

TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

TL;DR

This work reframes structure-based drug design as learning a reward distribution conditioned on protein pocket structure, using TacoGFN—a pocket-conditioned GFlowNet that generates 2D fragment-based ligands with rewards from predicted affinity, drug-likeness, and synthesizability. By modeling and conditioning on pocket structure via a GVP-GNN encoder, TacoGFN achieves state-of-the-art results on CrossDocked2020, with a 56.0% generative success rate and a median Vina Dock of kcal/mol, further improved to with fine-tuning to reach 88.8% success. A pharmacophore-based docking predictor enables fast, generalizable affinity evaluation, and large-scale ablations show benefits from pocket conditioning and larger docking datasets. The approach significantly accelerates exploration of the chemical space while producing drug-like, synthesizable molecules tailored to unseen pockets, with TacoGFN+FT setting new benchmarks among optimization-based baselines as well.

Abstract

Searching the vast chemical space for drug-like molecules that bind with a protein pocket is a challenging task in drug discovery. Recently, structure-based generative models have been introduced which promise to be more efficient by learning to generate molecules for any given protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, structure-based methods do not yet outperform optimization-based methods that generate binding molecules for just one pocket. To overcome limitations on data while leveraging learning across protein targets, we choose to model the reward distribution conditioned on pocket structure, instead of the training data distribution. We design TacoGFN, a novel GFlowNet-based approach for structure-based drug design, which can generate molecules conditioned on any protein pocket structure with probabilities proportional to its affinity and property rewards. In the generative setting for CrossDocked2020 benchmark, TacoGFN attains a state-of-the-art success rate of and kcal/mol in median Vina Dock score while improving the generation time by multiple orders of magnitude. Fine-tuning TacoGFN further improves the median Vina Dock score to kcal/mol and the success rate to , outperforming all optimization-based methods.
Paper Structure (39 sections, 13 equations, 5 figures, 12 tables)

This paper contains 39 sections, 13 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Overview of the sampling and training process of TacoGFN.
  • Figure 2: Model architecture of the docking score predictor. Each pharmacophore point is represented as a sphere and corresponds to a desired ligand characteristic for a binding interaction.
  • Figure 3: Our method is focused on de novo hit discovery - finding novel and diverse high-scoring hits for a protein target. Our method does not optimize based on a given reference seed compound. The goal of reward-based sampling is to sample diverse high-scoring molecules. To provide a fair overview of the model's performance, we selected protein pockets 4iwq and 4q8b, which are at the 25th and 75th percentiles, respectively, based on their docking scores with their native ligands. We show compare the molecules generated by TacoGFN against the native ligand with their QED, SA, Novelty, and Docking score (Vina).
  • Figure 4: The average of the top-10 Vina Dock of molecules generated for individual CrossDocked test pockets (target) by DecompDiff and TacoGFN. Targets are sorted by the average of the top-10 docking score of TacoGFN generated molecules. A lower docking score means a higher estimated binding affinity. Color is used to denote the average QED value of molecules in the Top-10 set. A higher QED indicates the molecule is more drug-like.
  • Figure 5: The average of the top-10 Vina Dock of molecules generated for individual CrossDocked test pockets (target) by DecompDiff and TacoGFN. Color is used to denote the average molecular weight of molecules in the Top-10 set. Molecular mass of an orally active drug should be less than 500 daltons LIPINSKI19973; Heavy molecules with high docking scores are more likely to be false positives Pan2002. Overall TacoGFN consistently achieves ideal molecular weight and strong Vina Dock.