Table of Contents
Fetching ...

MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning

Yuanxin Zhuang, Dazhong Shen, Ying Sun

TL;DR

MolEditRL tackles the problem of editing molecular structures to optimize properties while preserving scaffold integrity. It combines a structure-aware, discrete graph diffusion model with an editing-focused reinforcement learning fine-tuning stage, guided by property rewards and KL regularization. The framework is trained on MolEdit-Instruct, a large 3-million-example dataset spanning 10 properties, and achieves state-of-the-art editing accuracy and distributional fidelity with substantially fewer parameters. These advances offer a practical, scalable approach for precise, instruction-driven molecular edits that maintain structural realism and could accelerate lead optimization in drug discovery.

Abstract

Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose MolEditRL, a molecular editing framework that explicitly integrates structural constraints with precise property optimization. Specifically, MolEditRL consists of two stages: (1) a discrete graph diffusion model pretrained to reconstruct target molecules conditioned on source structures and natural language instructions; (2) an editing-aware reinforcement learning fine-tuning stage that further enhances property alignment and structural preservation by explicitly optimizing editing decisions under graph constraints. For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset, comprising 3 million diverse examples spanning single- and multi-property tasks across 10 chemical attributes. Experimental results demonstrate that MolEditRL significantly outperforms state-of-the-art methods in both property optimization accuracy and structural fidelity, achieving a 74\% improvement in editing success rate while using 98\% fewer parameters.

MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning

TL;DR

MolEditRL tackles the problem of editing molecular structures to optimize properties while preserving scaffold integrity. It combines a structure-aware, discrete graph diffusion model with an editing-focused reinforcement learning fine-tuning stage, guided by property rewards and KL regularization. The framework is trained on MolEdit-Instruct, a large 3-million-example dataset spanning 10 properties, and achieves state-of-the-art editing accuracy and distributional fidelity with substantially fewer parameters. These advances offer a practical, scalable approach for precise, instruction-driven molecular edits that maintain structural realism and could accelerate lead optimization in drug discovery.

Abstract

Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose MolEditRL, a molecular editing framework that explicitly integrates structural constraints with precise property optimization. Specifically, MolEditRL consists of two stages: (1) a discrete graph diffusion model pretrained to reconstruct target molecules conditioned on source structures and natural language instructions; (2) an editing-aware reinforcement learning fine-tuning stage that further enhances property alignment and structural preservation by explicitly optimizing editing decisions under graph constraints. For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset, comprising 3 million diverse examples spanning single- and multi-property tasks across 10 chemical attributes. Experimental results demonstrate that MolEditRL significantly outperforms state-of-the-art methods in both property optimization accuracy and structural fidelity, achieving a 74\% improvement in editing success rate while using 98\% fewer parameters.

Paper Structure

This paper contains 28 sections, 14 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Performance, FCD, and parameter size comparison.
  • Figure 2: Overview of MolEditRL.
  • Figure 3: Performance by number of edited properties.
  • Figure 4: Impact of step size, fine-tuning strategy, and KL regularization.
  • Figure 5: Performance comparison under structural constraints.
  • ...and 8 more figures