DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization
Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian Foster, Rick Stevens
TL;DR
DrugImprover addresses the high cost and slow progress of drug discovery by proposing an LLM-based drug optimization framework refined with Structured Policy Optimization (SPO). SPO leverages multiple critics and a partial-molecule improvement mechanism to provide dense, objective-aligned rewards, enabling direct policy improvement on the generative model. The authors theory-prove that SPO can find the optimizer and densifies the reward signal, and empirically demonstrate superior performance over strong baselines on SARS-CoV-2 and cancer targets using a 1-million-compound dataset and a docking surrogate. The approach preserves key properties of the input drug while improving multiple objectives, offering a scalable and explainable path toward faster, more reliable drug optimization with real-world applicability and open data.
Abstract
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research delves into the realm of drug optimization and introduce a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model, enhancing the original drug across target objectives, while retains the beneficial chemical properties of the original drug. This work is comprised of two primary components: (1) DrugImprover: A framework tailored for improving robustness and efficiency in drug optimization. It includes a LLM designed for drug optimization and a novel Structured Policy Optimization (SPO) algorithm, which is theoretically grounded. This algorithm offers a unique perspective for fine-tuning the LLM-based generative model by aligning the improvement of the generated molecule with the input molecule under desired objectives. (2) A dataset of 1 million compounds, each with OEDOCK docking scores on 5 human proteins associated with cancer cells and 24 binding sites from SARS-CoV-2 virus. We conduct a comprehensive evaluation of SPO and demonstrate its effectiveness in improving the original drug across target properties. Our code and dataset will be publicly available at: https://github.com/xuefeng-cs/DrugImproverGPT.
