Table of Contents
Fetching ...

Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement Learning

Salma J. Ahmed, Emad A. Mohammed

TL;DR

The paper tackles the high cost and duration of drug development by proposing a de-novo design framework that uses a Transformer-based MolT5 model fine-tuned with Proximal Policy Optimization (PPO) reinforcement learning to generate protein-targeted molecules. It combines a Drug-Target Interaction (DTI) reward and a chemical validity reward to steer generation toward target specificity and syntactic feasibility, with a two-stage training regime: Stage I pretraining on BindingDB and Stage II RL fine-tuning. Quantitative results show improved drug-likeness (QED), reduced molecular weight (MW), and optimized logP values, along with a very high novelty rate (0.041% memorized), validated through MOSES benchmarking and similarity analyses to ChEMBL inhibitors. The approach demonstrates a scalable, transformer-based pathway toward targeted drug discovery, with implications for accelerating early-stage lead generation and filtering candidates via learned policy optimization.

Abstract

Developing new drugs is laborious and costly, demanding extensive time investment. In this paper, we introduce a de-novo drug design strategy, which harnesses the capabilities of language models to devise targeted drugs for specific proteins. Employing a Reinforcement Learning (RL) framework utilizing Proximal Policy Optimization (PPO), we refine the model to acquire a policy for generating drugs tailored to protein targets. The proposed method integrates a composite reward function, combining considerations of drug-target interaction and molecular validity. Following RL fine-tuning, the proposed method demonstrates promising outcomes, yielding notable improvements in molecular validity, interaction efficacy, and critical chemical properties, achieving 65.37 for Quantitative Estimation of Drug-likeness (QED), 321.55 for Molecular Weight (MW), and 4.47 for Octanol-Water Partition Coefficient (logP), respectively. Furthermore, out of the generated drugs, only 0.041% do not exhibit novelty.

Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement Learning

TL;DR

The paper tackles the high cost and duration of drug development by proposing a de-novo design framework that uses a Transformer-based MolT5 model fine-tuned with Proximal Policy Optimization (PPO) reinforcement learning to generate protein-targeted molecules. It combines a Drug-Target Interaction (DTI) reward and a chemical validity reward to steer generation toward target specificity and syntactic feasibility, with a two-stage training regime: Stage I pretraining on BindingDB and Stage II RL fine-tuning. Quantitative results show improved drug-likeness (QED), reduced molecular weight (MW), and optimized logP values, along with a very high novelty rate (0.041% memorized), validated through MOSES benchmarking and similarity analyses to ChEMBL inhibitors. The approach demonstrates a scalable, transformer-based pathway toward targeted drug discovery, with implications for accelerating early-stage lead generation and filtering candidates via learned policy optimization.

Abstract

Developing new drugs is laborious and costly, demanding extensive time investment. In this paper, we introduce a de-novo drug design strategy, which harnesses the capabilities of language models to devise targeted drugs for specific proteins. Employing a Reinforcement Learning (RL) framework utilizing Proximal Policy Optimization (PPO), we refine the model to acquire a policy for generating drugs tailored to protein targets. The proposed method integrates a composite reward function, combining considerations of drug-target interaction and molecular validity. Following RL fine-tuning, the proposed method demonstrates promising outcomes, yielding notable improvements in molecular validity, interaction efficacy, and critical chemical properties, achieving 65.37 for Quantitative Estimation of Drug-likeness (QED), 321.55 for Molecular Weight (MW), and 4.47 for Octanol-Water Partition Coefficient (logP), respectively. Furthermore, out of the generated drugs, only 0.041% do not exhibit novelty.
Paper Structure (20 sections, 2 equations, 14 figures, 2 tables)

This paper contains 20 sections, 2 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: MolT5, a Transformer-based model designed for molecular data, is initially fine-tuned (Stage I) for molecule generation given an input protein. Subsequently, the fine-tuned model undergoes reinforcement learning fine-tuning (Stage II) to enable targeted compound generation, using drug-target interaction and validity as rewards.
  • Figure 2: The lefthand side depicts the methods used for reward calculation, focusing on drug-target interaction, while the righthand side illustrates the evaluation of molecule validity.
  • Figure 3: The percentage of valid molecules before and after fine-tuning a model with reinforcement learning using two policies. Policy 1 combines Drug-Target Interaction (DTI) and molecule validity for rewards, while policy 2 only considers molecule validity. This evaluation assesses the method's effectiveness and adaptability to different objectives and policies.
  • Figure 4: This figure illustrates the distributions of the chemical properties of the generated molecules, including Octanol-Water Partition Coefficient (log P), Molecular Weight, and Quantitative Estimation of Drug-likeness (QED), both before and after fine-tuning the model with Reinforcement Learning.
  • Figure 5: Illustrative instances of the molecules generated through inputting different proteins into the model, juxtaposed with samples of protein inhibitors and the Tanimoto similarity ($TS$) between the generated and inhibitor compounds.
  • ...and 9 more figures