Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement Learning
Salma J. Ahmed, Emad A. Mohammed
TL;DR
The paper tackles the high cost and duration of drug development by proposing a de-novo design framework that uses a Transformer-based MolT5 model fine-tuned with Proximal Policy Optimization (PPO) reinforcement learning to generate protein-targeted molecules. It combines a Drug-Target Interaction (DTI) reward and a chemical validity reward to steer generation toward target specificity and syntactic feasibility, with a two-stage training regime: Stage I pretraining on BindingDB and Stage II RL fine-tuning. Quantitative results show improved drug-likeness (QED), reduced molecular weight (MW), and optimized logP values, along with a very high novelty rate (0.041% memorized), validated through MOSES benchmarking and similarity analyses to ChEMBL inhibitors. The approach demonstrates a scalable, transformer-based pathway toward targeted drug discovery, with implications for accelerating early-stage lead generation and filtering candidates via learned policy optimization.
Abstract
Developing new drugs is laborious and costly, demanding extensive time investment. In this paper, we introduce a de-novo drug design strategy, which harnesses the capabilities of language models to devise targeted drugs for specific proteins. Employing a Reinforcement Learning (RL) framework utilizing Proximal Policy Optimization (PPO), we refine the model to acquire a policy for generating drugs tailored to protein targets. The proposed method integrates a composite reward function, combining considerations of drug-target interaction and molecular validity. Following RL fine-tuning, the proposed method demonstrates promising outcomes, yielding notable improvements in molecular validity, interaction efficacy, and critical chemical properties, achieving 65.37 for Quantitative Estimation of Drug-likeness (QED), 321.55 for Molecular Weight (MW), and 4.47 for Octanol-Water Partition Coefficient (logP), respectively. Furthermore, out of the generated drugs, only 0.041% do not exhibit novelty.
