Table of Contents
Fetching ...

Generative Active Learning for the Search of Small-molecule Protein Binders

Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra, Stanislaw Kamil Jastrzebski, Bharat Kaul, Doina Precup, José Miguel Hernández-Lobato, Marwin Segler, Michael Bronstein, Anne Marinier, Mike Tyers, Yoshua Bengio

TL;DR

The paper tackles the challenge of de novo small-molecule design within an astronomically large search space ($ ext{up to }10^{60}$) by introducing LambdaZero, a generative active-learning framework that combines a fast $E(n)$-invariant graph neural surrogate with a fragment-based policy constrained by synthesizability and drug-likeness. It uses an outer docking loop to iteratively refine candidates, achieving an exponential speed-up (approximately $10^{4}$ docking calls versus $10^{11}$ virtual screenings) while discovering novel, synthesizable scaffolds for soluble Epoxide Hydrolase 2 (sEH) and validating a lead inhibitor with sub-micromolar activity in vitro. The approach yields both in silico and experimental success: a sizable fraction of designed compounds inhibited sEH in vitro, including a potent lead, and the scaffolds were not present in known inhibitors. This work demonstrates a scalable, practically validated pathway for rapid discovery of high-affinity small-molecule binders and can be extended to other targets and higher-fidelity pharmacological pipelines.

Abstract

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

Generative Active Learning for the Search of Small-molecule Protein Binders

TL;DR

The paper tackles the challenge of de novo small-molecule design within an astronomically large search space () by introducing LambdaZero, a generative active-learning framework that combines a fast -invariant graph neural surrogate with a fragment-based policy constrained by synthesizability and drug-likeness. It uses an outer docking loop to iteratively refine candidates, achieving an exponential speed-up (approximately docking calls versus virtual screenings) while discovering novel, synthesizable scaffolds for soluble Epoxide Hydrolase 2 (sEH) and validating a lead inhibitor with sub-micromolar activity in vitro. The approach yields both in silico and experimental success: a sizable fraction of designed compounds inhibited sEH in vitro, including a potent lead, and the scaffolds were not present in known inhibitors. This work demonstrates a scalable, practically validated pathway for rapid discovery of high-affinity small-molecule binders and can be extended to other targets and higher-fidelity pharmacological pipelines.

Abstract

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
Paper Structure (26 sections, 1 equation, 9 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 1 equation, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Schematics of LambdaZero illustrating the overall approach. The approach consists of learning a fast surrogate model which is used to guide a generative policy to design de novo molecules with constraints on synthesizability and drug-likeness. Batches of candidates generated with the policy are evaluated using the molecular docking oracle. This whole outer loop is executed for a few rounds enriching the library. We then select candidates for in vitro synthesis and validation.
  • Figure 2: LambdaZero searches exponentially faster than virtual screening. (a) The tail of the distribution ($x >\mu+2.5\sigma$) of 5.8 million dock scores of drug-like molecules from Zinc20 with a generalized Gaussian distribution fit and its 95% confidence interval. The inset shows the remaining distribution with mean and $\pm1, 2, 3 \sigma$. (b) The number of calls to oracles against the highest reached normalized docking scores for LambdaZero and virtual screening in Zinc dataset.
  • Figure 3: LambdaZero designs leads to synthesizable sEH protein inhibitors. (a) Synthesized molecular library based on scaffold discovered by LambdaZero , and the highlighted strongest inhibitors. (b) Docking pose of UM0152608 (yellow), compared to the native sEH ligand in PDB 4jnc (green). Selected sEH amino acid residues in contact with the ligands have been labeled. (c) Concentration-response curves of top two compounds and calculated IC50 values. Data are plotted as mean ± standard deviation of three replicates.
  • Figure 4: The distribution of pairwise Tanimoto molecular similarity between LambdaZero generated molecules and known sEH inhibitors from ChEMBL and 50,000 molecules with highest docking score from virtual screening in Zinc20.
  • Figure 5: Synthesis of quinazoline-based scaffold (3).
  • ...and 4 more figures