Table of Contents
Fetching ...

Fragment-Masked Diffusion for Molecular Optimization

Kun Li, Xiantao Cai, Jia Wu, Shirui Pan, Huiting Xu, Bo Du, Wenbin Hu

TL;DR

This work introduces Fragment-Masked Diffusion for Molecular Optimization (FMOP), a first-of-its-kind method tailored to phenotypic drug discovery (PDD) that optimizes molecules by conditioning a regression-free diffusion process on numeric cell-line responses, specifically IC$_{50}$. By decomposing molecules into Murcko-based scaffolds and optimizable fragments, FMOP applies fragment masks and guide signals to generate scaffold-preserving yet efficacy-enhancing variants across 985 cell lines in the GDSCv2 dataset, achieving a reported in-silico optimization success rate of $95.4\%$ and an average efficacy increase of $7.5\%$. The framework integrates contrastive learning between drug and phenotypic response encoders, a two-network score-guided diffusion model, and a rule-based post-processing step to ensure chemical validity, with extensive ablations confirming the importance of fragment masks, task guidance, and post-processing. Across QM9 pretraining and large-scale GDSCv2 experiments, FMOP demonstrates robust, dataset-wide optimization and scaffold-consistent molecular changes, offering a scalable, target-agnostic path toward improved phenotypic drug efficacy. Practical deployment on the Beishenglai platform and a case study on Z-LLNle-CHO illustrate real-world applicability and potential for accelerating phenotypic drug discovery workflows.

Abstract

Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures or their hypothesized roles in combating diseases. However, challenges such as a limited number of available targets and a difficulty capturing clear structures hinder innovative drug development. In contrast, phenotypic drug discovery (PDD) does not depend on clear target structures and can identify hits with novel and unbiased polypharmacology signatures. As a result, PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success. Therefore, we propose a fragment-masked molecular optimization method based on PDD (FMOP). FMOP employs a regression-free diffusion model to conditionally optimize the molecular masked regions, effectively generating new molecules with similar scaffolds. On the large-scale drug response dataset GDSCv2, we optimize the potential molecules across all 985 cell lines. The overall experiments demonstrate that the in-silico optimization success rate reaches 95.4\%, with an average efficacy increase of 7.5\%. Additionally, we conduct extensive ablation and visualization experiments, confirming that FMOP is an effective and robust molecular optimization method. The code is available at: https://anonymous.4open.science/r/FMOP-98C2.

Fragment-Masked Diffusion for Molecular Optimization

TL;DR

This work introduces Fragment-Masked Diffusion for Molecular Optimization (FMOP), a first-of-its-kind method tailored to phenotypic drug discovery (PDD) that optimizes molecules by conditioning a regression-free diffusion process on numeric cell-line responses, specifically IC. By decomposing molecules into Murcko-based scaffolds and optimizable fragments, FMOP applies fragment masks and guide signals to generate scaffold-preserving yet efficacy-enhancing variants across 985 cell lines in the GDSCv2 dataset, achieving a reported in-silico optimization success rate of and an average efficacy increase of . The framework integrates contrastive learning between drug and phenotypic response encoders, a two-network score-guided diffusion model, and a rule-based post-processing step to ensure chemical validity, with extensive ablations confirming the importance of fragment masks, task guidance, and post-processing. Across QM9 pretraining and large-scale GDSCv2 experiments, FMOP demonstrates robust, dataset-wide optimization and scaffold-consistent molecular changes, offering a scalable, target-agnostic path toward improved phenotypic drug efficacy. Practical deployment on the Beishenglai platform and a case study on Z-LLNle-CHO illustrate real-world applicability and potential for accelerating phenotypic drug discovery workflows.

Abstract

Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures or their hypothesized roles in combating diseases. However, challenges such as a limited number of available targets and a difficulty capturing clear structures hinder innovative drug development. In contrast, phenotypic drug discovery (PDD) does not depend on clear target structures and can identify hits with novel and unbiased polypharmacology signatures. As a result, PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success. Therefore, we propose a fragment-masked molecular optimization method based on PDD (FMOP). FMOP employs a regression-free diffusion model to conditionally optimize the molecular masked regions, effectively generating new molecules with similar scaffolds. On the large-scale drug response dataset GDSCv2, we optimize the potential molecules across all 985 cell lines. The overall experiments demonstrate that the in-silico optimization success rate reaches 95.4\%, with an average efficacy increase of 7.5\%. Additionally, we conduct extensive ablation and visualization experiments, confirming that FMOP is an effective and robust molecular optimization method. The code is available at: https://anonymous.4open.science/r/FMOP-98C2.
Paper Structure (14 sections, 11 equations, 9 figures, 4 tables)

This paper contains 14 sections, 11 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: PDD molecular optimization task. The diagram on the right compares the IC50 distributions of original and optimized molecules obtained by our method.
  • Figure 2: Our method's framework. Our optimization method involves input conditions, including one molecule to be optimized $\mathbf{G}$ and the target conditions $\mathbf{c}$. Specifically, the target conditions include an IC50 value $y$ and one cell line $c$. In addition, the molecule to be optimized is processed through the scaffold $\mathcal{S}_{\mathbf{f} }$ to identify the regions that require optimization, generating the node $\mathcal{M}^X$ and the adjacency matrix mask $\mathcal{M}^A$.
  • Figure 3: Visualizations results for the IC50 distribution of molecules generated by fragment-based methods.
  • Figure 4: Visualizations results for the IC50 distribution of molecules generated by graph- and diffusion-based methods.
  • Figure 5: Visual comparison of our optimization method with generative methods. This illustrates the unique molecular structures generated by our method and compares them with various baselines across four distinct cell lines. Our method consistently produces diverse and effective molecules tailored to each cell line, avoiding convergence to the same local optimum.
  • ...and 4 more figures