DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning
Haoran Liu, Zheni Zeng, Yukun Yan, Yuxuan Chen, Yunduo Xiao
TL;DR
DrugR tackles the challenge of multi-objective molecular optimization by embedding explicit pharmacological reasoning within an LLM framework. It combines domain-focused continual pretraining, reverse data engineering for supervised fine-tuning, and a Pareto-aware self-balanced reinforcement learning scheme to optimize ADMET properties while preserving core molecular scaffolds, with interpretable rationale for each design step. Across extensive in silico evaluations, DrugR outperforms diverse baselines on overall optimization while maintaining binding affinity and structural similarity, and it demonstrates promising but cautious generalization to new drug classes. The work advances knowledge-driven drug discovery by offering interpretable design rationales and releasing code, data, and models to enable further research and pipeline integration.
Abstract
Molecule generation and optimization is a fundamental task in chemical domain. The rapid development of intelligent tools, especially large language models (LLMs) with powerful knowledge reserves and interactive capabilities, has provided new paradigms for it. Nevertheless, the intrinsic challenge for LLMs lies in the complex implicit relationship between molecular structure and pharmacological properties and the lack of corresponding labeled data. To bridge this gap, we propose DrugR, an LLM-based method that introduces explicit, step-by-step pharmacological reasoning into the optimization process. Our approach integrates domain-specific continual pretraining, supervised fine-tuning via reverse data engineering, and self-balanced multi-granular reinforcement learning. This framework enables DrugR to effectively improve key ADMET properties while preserving the original molecule's core efficacy. Experimental results demonstrate that DrugR achieves comprehensive enhancement across multiple properties without compromising structural similarity or target binding affinity. Importantly, its explicit reasoning process provides clear, interpretable rationales for each optimization step, yielding actionable design insights and advancing toward automated, knowledge-driven scientific discovery. Our code and model checkpoints are open-sourced to foster future research.
