Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets
Ulrich A. Mbou Sob, Qiulin Li, Miguel Arbesú, Oliver Bent, Andries P. Smit, Arnu Pretorius
TL;DR
De novo drug design navigates an enormous, sparse chemical space, estimated up to $10^{60}$, where generating syntactically valid and chemically plausible molecules is challenging. The authors propose a generative model that combines a variational auto-encoder latent space with an encoder–decoder transformer trained on the SAFER representation, achieving high validity and low fragmentation in generated molecules. They further apply reinforcement learning fine-tuning using docking rewards to improve binding to five protein targets, obtaining mean top-5% docking scores competitive with state-of-the-art methods and substantially increasing hit rates. The approach enables flexible latent-space exploration and potential conditioning on protein structures to generalize to unseen targets, with future work focusing on scaling, latent-space analyses, and target-conditioned generation.
Abstract
A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for certain targets. Additionally, our top 5% mean docking scores are comparable to the current state-of-the-art (SOTA), and we marginally outperform SOTA on three of the five targets.
