Table of Contents
Fetching ...

SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

Jiying Zhang, Zijing Liu, Yu Wang, Yu Li

TL;DR

A novel diffusion model termed SubGDiff is proposed for involving the molecular subgraph information in diffusion that adopts three vital techniques: i) subgraph prediction, ii) expectation state, and iii) k-step same subgraph diffusion, to enhance the perception of molecular substructure in the denoising network.

Abstract

Molecular representation learning has shown great success in advancing AI-based drug discovery. The core of many recent works is based on the fact that the 3D geometric structure of molecules provides essential information about their physical and chemical characteristics. Recently, denoising diffusion probabilistic models have achieved impressive performance in 3D molecular representation learning. However, most existing molecular diffusion models treat each atom as an independent entity, overlooking the dependency among atoms within the molecular substructures. This paper introduces a novel approach that enhances molecular representation learning by incorporating substructural information within the diffusion process. We propose a novel diffusion model termed SubGDiff for involving the molecular subgraph information in diffusion. Specifically, SubGDiff adopts three vital techniques: i) subgraph prediction, ii) expectation state, and iii) k-step same subgraph diffusion, to enhance the perception of molecular substructure in the denoising network. Experimentally, extensive downstream tasks demonstrate the superior performance of our approach. The code is available at https://github.com/youjibiying/SubGDiff.

SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

TL;DR

A novel diffusion model termed SubGDiff is proposed for involving the molecular subgraph information in diffusion that adopts three vital techniques: i) subgraph prediction, ii) expectation state, and iii) k-step same subgraph diffusion, to enhance the perception of molecular substructure in the denoising network.

Abstract

Molecular representation learning has shown great success in advancing AI-based drug discovery. The core of many recent works is based on the fact that the 3D geometric structure of molecules provides essential information about their physical and chemical characteristics. Recently, denoising diffusion probabilistic models have achieved impressive performance in 3D molecular representation learning. However, most existing molecular diffusion models treat each atom as an independent entity, overlooking the dependency among atoms within the molecular substructures. This paper introduces a novel approach that enhances molecular representation learning by incorporating substructural information within the diffusion process. We propose a novel diffusion model termed SubGDiff for involving the molecular subgraph information in diffusion. Specifically, SubGDiff adopts three vital techniques: i) subgraph prediction, ii) expectation state, and iii) k-step same subgraph diffusion, to enhance the perception of molecular substructure in the denoising network. Experimentally, extensive downstream tasks demonstrate the superior performance of our approach. The code is available at https://github.com/youjibiying/SubGDiff.
Paper Structure (47 sections, 1 theorem, 78 equations, 9 figures, 13 tables, 4 algorithms)

This paper contains 47 sections, 1 theorem, 78 equations, 9 figures, 13 tables, 4 algorithms.

Key Result

Lemma 2.0

Assume the forward and reverse processes of the diffusion model are both Markov chains. Given the forward Gaussian distribution $q(R^t|R^{t-1},R^0) = \mathcal{N}(R^t; \mu_1 R^{t-1},\sigma_1^2 \textbf{I})$, $q(R^{t-1}|R^0) = \mathcal{N}(R^{t-1}; \mu_2R^0,\sigma_2^2 \textbf{I})$ and $\epsilon_0 \sim \ Parameterizing $p_\theta(R^{t-1}|R^t)$ in the reverse process as $\mathcal{N}(R^{t-1}; \frac{1}{\mu

Figures (9)

  • Figure 1: Equilibrium probability of the six conformers (c1–c6) of the same molecule ibuprofen (C13H18O2) in four different conditions. The 3D substructure is a significant characteristic of a molecule. (Adapted with permission from marinova2018dynamics. Copyright 2018 American Chemical Society.)
  • Figure 2: Comparison of forward process between DDPM ho2020denoising and subgraph diffusion. For each step, DDPM adds noise into all atomic coordinates, while subgraph diffusion selects a subset of the atoms to diffuse.
  • Figure 3: The Markov Chain of SubGDiff is a lazy Markov Chain.
  • Figure 4: The forward process of SubGDiff. The state $0$ to $km$ uses the expectation state and the mask variables are the same in the interval $[ki,ki+k],i=0,1,...,m-1$. The state $km+1$ to $t$ applies the same subgraph diffusion.
  • Figure 5: An example of $k$-step same subgraph diffusion, where the mask vectors are same as $\mathbf{s}_{km+1}$ from step $km$ to $(m+1)k$, $m\in \mathbb{N}^+$ .
  • ...and 4 more figures

Theorems & Definitions (1)

  • Lemma 2.0