Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Ulrich A. Mbou Sob; Qiulin Li; Miguel Arbesú; Oliver Bent; Andries P. Smit; Arnu Pretorius

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Ulrich A. Mbou Sob, Qiulin Li, Miguel Arbesú, Oliver Bent, Andries P. Smit, Arnu Pretorius

TL;DR

De novo drug design navigates an enormous, sparse chemical space, estimated up to $10^{60}$, where generating syntactically valid and chemically plausible molecules is challenging. The authors propose a generative model that combines a variational auto-encoder latent space with an encoder–decoder transformer trained on the SAFER representation, achieving high validity and low fragmentation in generated molecules. They further apply reinforcement learning fine-tuning using docking rewards to improve binding to five protein targets, obtaining mean top-5% docking scores competitive with state-of-the-art methods and substantially increasing hit rates. The approach enables flexible latent-space exploration and potential conditioning on protein structures to generalize to unseen targets, with future work focusing on scaling, latent-space analyses, and target-conditioned generation.

Abstract

A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for certain targets. Additionally, our top 5% mean docking scores are comparable to the current state-of-the-art (SOTA), and we marginally outperform SOTA on three of the five targets.

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

TL;DR

De novo drug design navigates an enormous, sparse chemical space, estimated up to

, where generating syntactically valid and chemically plausible molecules is challenging. The authors propose a generative model that combines a variational auto-encoder latent space with an encoder–decoder transformer trained on the SAFER representation, achieving high validity and low fragmentation in generated molecules. They further apply reinforcement learning fine-tuning using docking rewards to improve binding to five protein targets, obtaining mean top-5% docking scores competitive with state-of-the-art methods and substantially increasing hit rates. The approach enables flexible latent-space exploration and potential conditioning on protein structures to generalize to unseen targets, with future work focusing on scaling, latent-space analyses, and target-conditioned generation.

Abstract

Paper Structure (17 sections, 4 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 4 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Methods
Tokenization
Model Architecture
RL Fine-Tuning to Protein Targets
Experiments
Model Scaling
Model Finetuining
Discussion
Appendix
Conversion Algorithm
Datasets
Model's implementation and pre-training details
Evalution metrics
...and 2 more sections

Figures (6)

Figure 1: Schematic representation of our model's architecture. A sequence of $N$ tokens is passed as input to our encoder which is a transformer model. The output encoded embeddings of shape $N\times E$ are either passed directly to the mean and logvar layers (path 1) or they are first passed to the perceiver resampler layer which maps the encoded embeddings to a reduced dimension of shape $L_S\times L_E$ (path 2). The mean and logvar layers are linear layers that are applied independently to each sequence dimension. The final reparametrised embeddings are then passed to the decoder transformer model to be used as encoder embeddings in the decoder's cross-attention layers.
Figure 2: Visualisation of generated molecules.Left: Original molecules from the dataset. Middle: Generated molecules with the pre-trained model. Right: The generated molecules obtained after RL fine-tuning. These results were obtained after fine-tuning MoGeL-64 for molecular docking to the target JAK2. The labels indicate the molecules measured QED, SA and docking score.
Figure 3: Comparing SAFE vs SAFER. We compute the average scores of 10k molecules sampled using two models with embedding dimension 128 trained using the SAFE and SAFER representations. Left: Greedy decoding (temperature = 0). Right: Stochastic decoding (temperature = 1). Both representations are capable of generating molecules with high QED $>$ 0.5 but the SAFER representation significantly outperforms the SAFE representation on our combined validation metric (see Eq. \ref{['sl_metric']}) due to the lower fraction of fragmented molecules that are generated using the SAFER representation.
Figure 4: Schematic representation of the RL fine-tuning pipeline. The original molecule is passed to the pre-trained model that maps it to a region in our latent distribution using the mean and logvar layers. A latent vector is sampled for this region and passed to the decoder to generate a new molecule. The new molecule and the protein target are passed to the docking tool to perform molecular docking and produce a docking score for the new molecule. Following this, a reward is assigned to the molecule based on the comparison between the new docking score and the original docking score. We then use the reward to compute the loss and update the model's parameters.
Figure 5: (a) Mean docking scores on the validation set during the fine-tuning of the model MoGel-128 to the protein target JAK2. The blue and orange curves in (b) are plots of the percentage of molecules whose docking scores increase or reduce by values greater than 2, respectively.
...and 1 more figures

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

TL;DR

Abstract

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Authors

TL;DR

Abstract

Table of Contents

Figures (6)