Table of Contents
Fetching ...

Regressor-free Molecule Generation to Support Drug Response Prediction

Kun Li, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

TL;DR

This work tackles DRP by introducing regressor-free guidance for diffusion-based molecule generation, replacing classifier-based conditioning with a regression controller that maps target IC50 values to text via a CN-KG to ensure ordered numerical representations. A dual-branch DBControl diffusion model provides robust score estimation under limited task-specific data, enabling sampling within a narrow, target-centered space. Experimental results on real DRP data show improved generation quality (lower FCD) and better alignment with target IC50 values compared to strong baselines, suggesting practical gains in de novo drug design. The framework supports more efficient screening by producing molecules with higher likelihoods of desired DRP outcomes, though it requires substantial computational resources and lacks wet-lab validation at present.

Abstract

Drug response prediction (DRP) is a crucial phase in drug discovery, and the most important metric for its evaluation is the IC50 score. DRP results are heavily dependent on the quality of the generated molecules. Existing molecule generation methods typically employ classifier-based guidance, enabling sampling within the IC50 classification range. However, these methods fail to ensure the sampling space range's effectiveness, generating numerous ineffective molecules. Through experimental and theoretical study, we hypothesize that conditional generation based on the target IC50 score can obtain a more effective sampling space. As a result, we introduce regressor-free guidance molecule generation to ensure sampling within a more effective space and support DRP. Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels. To effectively map regression labels between drugs and cell lines, we design a common-sense numerical knowledge graph that constrains the order of text representations. Experimental results on the real-world dataset for the DRP task demonstrate our method's effectiveness in drug discovery. The code is available at:https://anonymous.4open.science/r/RMCD-DBD1.

Regressor-free Molecule Generation to Support Drug Response Prediction

TL;DR

This work tackles DRP by introducing regressor-free guidance for diffusion-based molecule generation, replacing classifier-based conditioning with a regression controller that maps target IC50 values to text via a CN-KG to ensure ordered numerical representations. A dual-branch DBControl diffusion model provides robust score estimation under limited task-specific data, enabling sampling within a narrow, target-centered space. Experimental results on real DRP data show improved generation quality (lower FCD) and better alignment with target IC50 values compared to strong baselines, suggesting practical gains in de novo drug design. The framework supports more efficient screening by producing molecules with higher likelihoods of desired DRP outcomes, though it requires substantial computational resources and lacks wet-lab validation at present.

Abstract

Drug response prediction (DRP) is a crucial phase in drug discovery, and the most important metric for its evaluation is the IC50 score. DRP results are heavily dependent on the quality of the generated molecules. Existing molecule generation methods typically employ classifier-based guidance, enabling sampling within the IC50 classification range. However, these methods fail to ensure the sampling space range's effectiveness, generating numerous ineffective molecules. Through experimental and theoretical study, we hypothesize that conditional generation based on the target IC50 score can obtain a more effective sampling space. As a result, we introduce regressor-free guidance molecule generation to ensure sampling within a more effective space and support DRP. Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels. To effectively map regression labels between drugs and cell lines, we design a common-sense numerical knowledge graph that constrains the order of text representations. Experimental results on the real-world dataset for the DRP task demonstrate our method's effectiveness in drug discovery. The code is available at:https://anonymous.4open.science/r/RMCD-DBD1.
Paper Structure (28 sections, 6 theorems, 16 equations, 7 figures, 9 tables, 2 algorithms)

This paper contains 28 sections, 6 theorems, 16 equations, 7 figures, 9 tables, 2 algorithms.

Key Result

Proposition 1

For any $\bm{C}_{\mathrm{aim} } \in\left ( 0,1 \right )$, then $\left \| S_{cls} \right \| \ge \left \| S_{reg} \right \|$ exists.

Figures (7)

  • Figure 1: Sampling space comparison for target conditions in classifier- vs. regressor-based guidance molecule generation.
  • Figure 2: (a) illustrates the training process of the regression controller model, which serves as a conditional encoder guiding diffusion. (b) depicts the regressor-free guidance diffusion process, utilizing the text encoder of the trained regression controller model to encode the target conditions. The DBControl model is a score-based noise prediction model trained on a mixture of the conditional GDSCv2 and unconditional QM9 dataset.
  • Figure 3: UMAP visualization of molecule generation results with our method compared to four mainstream methods using the target pair (NCI-H187, $\mathrm{IC}_{50}$=0.35).
  • Figure 4: Visualization of regressor-free guidance strength trends. The x-axis represents the conditions' intensity, where $w = 0.0$ refers to non-guided models, while the y-axis represents the corresponding metric values.
  • Figure 5: Schematic of the DBControl.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Proposition 1: Main proposition
  • Proposition 2: Uniqueness of $\bm{C}_{\mathrm{aim} }$ Representation
  • Proposition 3: Equal interval representation of $\Theta$
  • Proposition 1: Main proposition
  • Proposition 2: Uniqueness of $\bm{C}_{\mathrm{aim} }$ Representation
  • Proposition 3: Equal interval representation of $\Theta$