Table of Contents
Fetching ...

FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough Reproduction

Alexander Telepov, Artem Tsypin, Kuzma Khrabrov, Sergey Yakukhnov, Pavel Strashnov, Petr Zhilyaev, Egor Rumiantsev, Daniel Ezhov, Manvel Avetisian, Olga Popova, Artur Kadurin

TL;DR

This work assesses protein-conditioned fragment-based molecule generation via RL, focusing on reproducing and fixing the FREED framework. It identifies critical implementation bugs, proposes FFREED as a corrected baseline, and introduces FREED++ as a streamlined, faster variant with ablations. Across extensive experiments on multiple protein targets and fragment libraries, FREED++ consistently achieves superior docking-score performance compared with baselines, while offering better generalization and stability. The study emphasizes reproducibility, broader evaluation, and practical applicability to USP7 inhibitors, highlighting the importance of library design and robust RL components for drug discovery pipelines.

Abstract

A rational design of new therapeutic drugs aims to find a molecular structure with desired biological functionality, e.g., an ability to activate or suppress a specific protein via binding to it. Molecular docking is a common technique for evaluating protein-molecule interactions. Recently, Reinforcement Learning (RL) has emerged as a promising approach to generating molecules with the docking score (DS) as a reward. In this work, we reproduce, scrutinize and improve the recent RL model for molecule generation called FREED (arXiv:2110.01219). Extensive evaluation of the proposed method reveals several limitations and challenges despite the outstanding results reported for three target proteins. Our contributions include fixing numerous implementation bugs and simplifying the model while increasing its quality, significantly extending experiments, and conducting an accurate comparison with current state-of-the-art methods for protein-conditioned molecule generation. We show that the resulting fixed model is capable of producing molecules with superior docking scores compared to alternative approaches.

FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough Reproduction

TL;DR

This work assesses protein-conditioned fragment-based molecule generation via RL, focusing on reproducing and fixing the FREED framework. It identifies critical implementation bugs, proposes FFREED as a corrected baseline, and introduces FREED++ as a streamlined, faster variant with ablations. Across extensive experiments on multiple protein targets and fragment libraries, FREED++ consistently achieves superior docking-score performance compared with baselines, while offering better generalization and stability. The study emphasizes reproducibility, broader evaluation, and practical applicability to USP7 inhibitors, highlighting the importance of library design and robust RL components for drug discovery pipelines.

Abstract

A rational design of new therapeutic drugs aims to find a molecular structure with desired biological functionality, e.g., an ability to activate or suppress a specific protein via binding to it. Molecular docking is a common technique for evaluating protein-molecule interactions. Recently, Reinforcement Learning (RL) has emerged as a promising approach to generating molecules with the docking score (DS) as a reward. In this work, we reproduce, scrutinize and improve the recent RL model for molecule generation called FREED (arXiv:2110.01219). Extensive evaluation of the proposed method reveals several limitations and challenges despite the outstanding results reported for three target proteins. Our contributions include fixing numerous implementation bugs and simplifying the model while increasing its quality, significantly extending experiments, and conducting an accurate comparison with current state-of-the-art methods for protein-conditioned molecule generation. We show that the resulting fixed model is capable of producing molecules with superior docking scores compared to alternative approaches.
Paper Structure (60 sections, 25 equations, 10 figures, 13 tables)

This paper contains 60 sections, 25 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: An overview of RL-based sequential generation methods. The agent takes the current state (section \ref{['sec:state']}) and selects an action (section \ref{['sec:action']}). The action is usually a molecular fragment appended to the current state. Individual atoms or atom bonds are considered trivial cases of fragments. The transition dynamic is straightforward: the new state is assembled from the previous state by attaching a new fragment to it. Note that in some frameworks zhou2019optimizationjeon2020autonomous a removal of the fragment is also considered an action. The specific task defines the reward. Some examples include cLogP, QED, and various binding affinity proxies. If $J$ is the optimization objective, the reward can be chosen as $r_{t+1} = J(s_{t+1}) - J(s_t)$ or $r_{t+1}=\left\{$0$if $s_{t+1}$ is non-terminal;$J(s_{t+1})$if $s_{t+1}$ is terminal.\right.$
  • Figure 2: Overview of fragment-based molecule generation frameworks. At each step, a fragment is selected from a pre-defined fragment library $\mathcal{F}$ and attached to the current state $s$. In general, different encoding methods may be used to handle $s$ and the selected fragment $f$, but in this work, both are processed with the same GCN.
  • Figure 3: A schematic overview of the actor and the critic. The action selection process is depicted in Fig. \ref{['fig:action_selection']}. Step 1: the current state $s$ is embedded with a GCN $G_{\theta}$, and the resulting embedding is fused with the embeddings of its attachment points. Then, one of the attachment points is selected as $a_1$, and its embedding $\tilde{a}_1$ is used in the next step. Step 2: $\tilde{a}_1$ is passed through an MLP to get a distribution over the available fragments. One of the fragments is selected as $a_2$, and its embedding $\tilde{a}_2$ is used in the next step. Step 3: the selected fragment $a_2$ is processed by the same GCN $G_{\theta}$ to obtain the embeddings of its attachment points. After fusing these embeddings with $\tilde{a}_2$, one of the attachment points of the fragment is selected as $a_3$. Critic: The embeddings of all actions are concatenated with the state embedding and processed by the critic (Fig. \ref{['fig:critic']}).
  • Figure 4: Selected representative molecules which were generated by FREED++ with USP7 as a target protein. Docking score and maximum Tanimoto similarity to set of known inhibitors depicted below molecules. Fragment libraries: BRICS-MOSES (A); BRICS-USP7 (B); CReM-ZINC (C).
  • Figure 5: Average docking score over batch during training for MolDQN. Shaded regions denote 95% confidence interval.
  • ...and 5 more figures