Table of Contents
Fetching ...

Autonomous Drug Design with Multi-Armed Bandits

Hampus Gummesson Svensson, Esben Jannik Bjerrum, Christian Tyrchan, Ola Engkvist, Morteza Haghir Chehreghani

TL;DR

This work formulates autonomous drug design as a stochastic multi-armed bandit problem with multiple plays, volatile arms, and similarity information to optimize a DMTA (design-make-test-analyze) cycle. It extends the contextual Zooming algorithm to handle multiple plays and arm volatility, introducing weighted and unweighted variants and an Oracle-based selection mechanism, while using a digital twin to simulate end-to-end DMTA cycles. Through simulations, the authors compare these approaches against random, greedy, and decaying-epsilon-greedy strategies, showing that Zooming-based methods can effectively explore the chemical space while exploiting high-activity regions, with trade-offs in novelty over time. The results suggest that integrating such MAB-based strategies into autonomous drug-design pipelines can improve efficiency and discovery in synthetic chemistry.

Abstract

Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design.

Autonomous Drug Design with Multi-Armed Bandits

TL;DR

This work formulates autonomous drug design as a stochastic multi-armed bandit problem with multiple plays, volatile arms, and similarity information to optimize a DMTA (design-make-test-analyze) cycle. It extends the contextual Zooming algorithm to handle multiple plays and arm volatility, introducing weighted and unweighted variants and an Oracle-based selection mechanism, while using a digital twin to simulate end-to-end DMTA cycles. Through simulations, the authors compare these approaches against random, greedy, and decaying-epsilon-greedy strategies, showing that Zooming-based methods can effectively explore the chemical space while exploiting high-activity regions, with trade-offs in novelty over time. The results suggest that integrating such MAB-based strategies into autonomous drug-design pipelines can improve efficiency and discovery in synthetic chemistry.

Abstract

Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design.
Paper Structure (19 sections, 9 equations, 2 figures, 4 algorithms)

This paper contains 19 sections, 9 equations, 2 figures, 4 algorithms.

Figures (2)

  • Figure 1: A schematic illustration of the (autonomous) drug design process.
  • Figure 2: Normalized cumulative reward, novelty of selected actives and the mean of the former two averaged over 10 runs for each selection strategy. For the former two, the 95% approximate confidence intervals of the averages over 10 runs is shown. A novelty of 1 corresponds to selecting actives that are entirely dissimilar to previously selected actives, while a novelty of 0 corresponds to a selection that is equal in similarity to the previously selected actives. $\epsilon_t$-greedy with $\epsilon_{\text{max}} = 0.6$ and unweighted Zooming show good performance with regard to both cumulative reward and novelty in the first 50 cycles, while weighted Zooming and greedy both performs well for at least the last 100 cycles.