Fragment-Masked Diffusion for Molecular Optimization
Kun Li, Xiantao Cai, Jia Wu, Shirui Pan, Huiting Xu, Bo Du, Wenbin Hu
TL;DR
This work introduces Fragment-Masked Diffusion for Molecular Optimization (FMOP), a first-of-its-kind method tailored to phenotypic drug discovery (PDD) that optimizes molecules by conditioning a regression-free diffusion process on numeric cell-line responses, specifically IC$_{50}$. By decomposing molecules into Murcko-based scaffolds and optimizable fragments, FMOP applies fragment masks and guide signals to generate scaffold-preserving yet efficacy-enhancing variants across 985 cell lines in the GDSCv2 dataset, achieving a reported in-silico optimization success rate of $95.4\%$ and an average efficacy increase of $7.5\%$. The framework integrates contrastive learning between drug and phenotypic response encoders, a two-network score-guided diffusion model, and a rule-based post-processing step to ensure chemical validity, with extensive ablations confirming the importance of fragment masks, task guidance, and post-processing. Across QM9 pretraining and large-scale GDSCv2 experiments, FMOP demonstrates robust, dataset-wide optimization and scaffold-consistent molecular changes, offering a scalable, target-agnostic path toward improved phenotypic drug efficacy. Practical deployment on the Beishenglai platform and a case study on Z-LLNle-CHO illustrate real-world applicability and potential for accelerating phenotypic drug discovery workflows.
Abstract
Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures or their hypothesized roles in combating diseases. However, challenges such as a limited number of available targets and a difficulty capturing clear structures hinder innovative drug development. In contrast, phenotypic drug discovery (PDD) does not depend on clear target structures and can identify hits with novel and unbiased polypharmacology signatures. As a result, PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success. Therefore, we propose a fragment-masked molecular optimization method based on PDD (FMOP). FMOP employs a regression-free diffusion model to conditionally optimize the molecular masked regions, effectively generating new molecules with similar scaffolds. On the large-scale drug response dataset GDSCv2, we optimize the potential molecules across all 985 cell lines. The overall experiments demonstrate that the in-silico optimization success rate reaches 95.4\%, with an average efficacy increase of 7.5\%. Additionally, we conduct extensive ablation and visualization experiments, confirming that FMOP is an effective and robust molecular optimization method. The code is available at: https://anonymous.4open.science/r/FMOP-98C2.
