Generative Active Learning for the Search of Small-molecule Protein Binders
Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra, Stanislaw Kamil Jastrzebski, Bharat Kaul, Doina Precup, José Miguel Hernández-Lobato, Marwin Segler, Michael Bronstein, Anne Marinier, Mike Tyers, Yoshua Bengio
TL;DR
The paper tackles the challenge of de novo small-molecule design within an astronomically large search space ($ ext{up to }10^{60}$) by introducing LambdaZero, a generative active-learning framework that combines a fast $E(n)$-invariant graph neural surrogate with a fragment-based policy constrained by synthesizability and drug-likeness. It uses an outer docking loop to iteratively refine candidates, achieving an exponential speed-up (approximately $10^{4}$ docking calls versus $10^{11}$ virtual screenings) while discovering novel, synthesizable scaffolds for soluble Epoxide Hydrolase 2 (sEH) and validating a lead inhibitor with sub-micromolar activity in vitro. The approach yields both in silico and experimental success: a sizable fraction of designed compounds inhibited sEH in vitro, including a potent lead, and the scaffolds were not present in known inhibitors. This work demonstrates a scalable, practically validated pathway for rapid discovery of high-affinity small-molecule binders and can be extended to other targets and higher-fidelity pharmacological pipelines.
Abstract
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
