Table of Contents
Fetching ...

ControlMol: Adding Substructure Control To Molecule Diffusion Models

Qi Zhengyang, Liu Zijing, Zhang Jiying, Cao He, Li Yu

TL;DR

A two-stage training approach, consisting of condition learning and condition optimization, that outperforms previous techniques by generating more valid and diverse molecules in computer-aided drug design.

Abstract

Due to the vast design space of molecules, generating molecules conditioned on a specific sub-structure relevant to a particular function or therapeutic target is a crucial task in computer-aided drug design. Existing works mainly focus on specific tasks, such as linker design or scaffold hopping, each task requires training a model from scratch, and many well-pretrained De Novo molecule generation model parameters are not effectively utilized. To this end, we propose a two-stage training approach, consisting of condition learning and condition optimization. In the condition learning stage, we adopt the idea of ControlNet and design some meaningful adjustments to make the unconditional generative model learn sub-structure conditioned generation. In the condition optimization stage, by using human preference learning, we further enhance the stability and robustness of sub-structure control. In our experiments, only trained on randomly partitioned sub-structure data, the proposed method outperforms previous techniques by generating more valid and diverse molecules. Our method is easy to implement and can be quickly applied to various pre-trained molecule generation models.

ControlMol: Adding Substructure Control To Molecule Diffusion Models

TL;DR

A two-stage training approach, consisting of condition learning and condition optimization, that outperforms previous techniques by generating more valid and diverse molecules in computer-aided drug design.

Abstract

Due to the vast design space of molecules, generating molecules conditioned on a specific sub-structure relevant to a particular function or therapeutic target is a crucial task in computer-aided drug design. Existing works mainly focus on specific tasks, such as linker design or scaffold hopping, each task requires training a model from scratch, and many well-pretrained De Novo molecule generation model parameters are not effectively utilized. To this end, we propose a two-stage training approach, consisting of condition learning and condition optimization. In the condition learning stage, we adopt the idea of ControlNet and design some meaningful adjustments to make the unconditional generative model learn sub-structure conditioned generation. In the condition optimization stage, by using human preference learning, we further enhance the stability and robustness of sub-structure control. In our experiments, only trained on randomly partitioned sub-structure data, the proposed method outperforms previous techniques by generating more valid and diverse molecules. Our method is easy to implement and can be quickly applied to various pre-trained molecule generation models.
Paper Structure (21 sections, 7 equations, 3 figures, 2 tables)

This paper contains 21 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The overall architecture of ControlMol. Different colors of node represent different node features. After adding $c$, atoms corresponding to those in substructure in $C_t$ are more similar to the origin substructure both in terms of their position and node feature compared to $Z_t$, this will implicitly provide conditional information to the model.
  • Figure 2: Samples conditioned on "c1ccccc1" and "C1CCC1", we use RDkit to generate their conformer from smiles and sample condition on it. The number behind the figure is the RMSD between samples with the conditioned conformer (structure on the left).
  • Figure 3: After the Condition Optimization Stage, ControlMol can achieve stable controlled generation on diverse atom types and structures.