Table of Contents
Fetching ...

DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization

Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian Foster, Rick Stevens

TL;DR

DrugImprover addresses the high cost and slow progress of drug discovery by proposing an LLM-based drug optimization framework refined with Structured Policy Optimization (SPO). SPO leverages multiple critics and a partial-molecule improvement mechanism to provide dense, objective-aligned rewards, enabling direct policy improvement on the generative model. The authors theory-prove that SPO can find the optimizer and densifies the reward signal, and empirically demonstrate superior performance over strong baselines on SARS-CoV-2 and cancer targets using a 1-million-compound dataset and a docking surrogate. The approach preserves key properties of the input drug while improving multiple objectives, offering a scalable and explainable path toward faster, more reliable drug optimization with real-world applicability and open data.

Abstract

Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research delves into the realm of drug optimization and introduce a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model, enhancing the original drug across target objectives, while retains the beneficial chemical properties of the original drug. This work is comprised of two primary components: (1) DrugImprover: A framework tailored for improving robustness and efficiency in drug optimization. It includes a LLM designed for drug optimization and a novel Structured Policy Optimization (SPO) algorithm, which is theoretically grounded. This algorithm offers a unique perspective for fine-tuning the LLM-based generative model by aligning the improvement of the generated molecule with the input molecule under desired objectives. (2) A dataset of 1 million compounds, each with OEDOCK docking scores on 5 human proteins associated with cancer cells and 24 binding sites from SARS-CoV-2 virus. We conduct a comprehensive evaluation of SPO and demonstrate its effectiveness in improving the original drug across target properties. Our code and dataset will be publicly available at: https://github.com/xuefeng-cs/DrugImproverGPT.

DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization

TL;DR

DrugImprover addresses the high cost and slow progress of drug discovery by proposing an LLM-based drug optimization framework refined with Structured Policy Optimization (SPO). SPO leverages multiple critics and a partial-molecule improvement mechanism to provide dense, objective-aligned rewards, enabling direct policy improvement on the generative model. The authors theory-prove that SPO can find the optimizer and densifies the reward signal, and empirically demonstrate superior performance over strong baselines on SARS-CoV-2 and cancer targets using a 1-million-compound dataset and a docking surrogate. The approach preserves key properties of the input drug while improving multiple objectives, offering a scalable and explainable path toward faster, more reliable drug optimization with real-world applicability and open data.

Abstract

Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research delves into the realm of drug optimization and introduce a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model, enhancing the original drug across target objectives, while retains the beneficial chemical properties of the original drug. This work is comprised of two primary components: (1) DrugImprover: A framework tailored for improving robustness and efficiency in drug optimization. It includes a LLM designed for drug optimization and a novel Structured Policy Optimization (SPO) algorithm, which is theoretically grounded. This algorithm offers a unique perspective for fine-tuning the LLM-based generative model by aligning the improvement of the generated molecule with the input molecule under desired objectives. (2) A dataset of 1 million compounds, each with OEDOCK docking scores on 5 human proteins associated with cancer cells and 24 binding sites from SARS-CoV-2 virus. We conduct a comprehensive evaluation of SPO and demonstrate its effectiveness in improving the original drug across target properties. Our code and dataset will be publicly available at: https://github.com/xuefeng-cs/DrugImproverGPT.

Paper Structure

This paper contains 40 sections, 2 theorems, 25 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Lemma 5.1

If BON finds a sequence that strictly improves over the current molecule in the sense of eq:strict improvment, any policy $\pi^*$ maximizes $J(\pi)$ if and only if it maximizes the original reward $J_0(\pi)$.

Figures (4)

  • Figure 1: DrugImprover framework. It comprises two major components: (1) A large language model designed for drug optimization. (2) A Structured Policy Optimization (SPO) algorithm aims to fine-tune the LLM-based generator for drug improvement across desired properties.
  • Figure 2: Tanimoto Similarity over five experimental runs.
  • Figure 3: The binding sites of proteins 3CLPro (PDB ID: 7BQY) (Left) and RTCB (PDB ID: 4DWQ) (Right). Open Eye software are used to identify atoms around the crystallized compound as binding sites.
  • Figure 4: Training corpus example and visualization

Theorems & Definitions (5)

  • Definition 4.1: Advantage preference
  • Lemma 5.1
  • Lemma 5.2
  • proof : Proof of Lemma \ref{['lem:equivalence']}
  • proof : Proof of Lemma \ref{['lem:gradient']}