Table of Contents
Fetching ...

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu

TL;DR

A force-guided SE(3) diffusion model, ConfDiff, that can generate protein conformations with rich diversity while preserving high fidelity by incorporating a force-guided network with a mixture of data-based score models.

Abstract

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

TL;DR

A force-guided SE(3) diffusion model, ConfDiff, that can generate protein conformations with rich diversity while preserving high fidelity by incorporating a force-guided network with a mixture of data-based score models.

Abstract

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.
Paper Structure (36 sections, 2 theorems, 34 equations, 10 figures, 10 tables, 2 algorithms)

This paper contains 36 sections, 2 theorems, 34 equations, 10 figures, 10 tables, 2 algorithms.

Key Result

Proposition 1

Suppose $p_0(\mathbf{x}_0) = q_0(\mathbf{x}_0)\frac{e^{-k\mathcal{E}_0 (\mathbf{x}_0)}}{Z}$, and for $t\in (0, 1]$, $p_t(\mathbf{x}_t|\mathbf{x}_0):=q_t(\mathbf{x}_t|\mathbf{x}_0)$. Then, the marginal distribution satisfies $p_t(\mathbf{x}_t)\propto q_t(\mathbf{x}_t) e^{-k \mathcal{E}_t(\mathbf{x}_t

Figures (10)

  • Figure 1: Protein conformation generation with multiple guidance strategies. Upper: With a mixture of sequence-conditional and unconditional score models, ConfDiff in Section \ref{['baseline']} samples diverse conformations with reasonable quality. Lower: Incorporating force guidance in Section \ref{['force-guided']}, the model generates structures with lower energy, better comply with the Boltzmann distribution.
  • Figure 2: Energy (left) and diversity (right) of sampled conformations for WW-domain with various levels of force guidance ($\eta$) and sequence condition ($\gamma$). Models with weaker sequence condition generate more diverse samples and force guidance improves conformation stability without drastically decease the diversity.
  • Figure 3: Sample distribution over the first two TIC components for WW-domain. Ref (N=1000) shows 1000 random samples from the reference MD simulation. The illustration in the lower left shows the experimental structure in the folded state (in color) and 5 random samples from the reference (in grey).
  • Figure 4: Metastable state prediction for BPTI. A) The precision of predicting Cluster 3 vs varying sample sizes. B) Visual comparison of best samples of three models for Cluster 1 and Cluster 3. Reference structures are shown in color and sample structures are in grey. RMSD vs reference is labeled. For Cluster 1, Str2Str shows lower accuracy in the upper loop region (red arrow); for Cluster 3, the more challenging task, EigenFold does not correctly predict the $\beta$-sheet (orange arrow) and Str2Str shows more global structural misalignment (red arrows) that might contribute to higher RMSD.
  • Figure S1: single-layer of ConfDiff. This architecture, drawing references from alphafolddiffusionse3, can be widely applied in the unconditional model, the conditional model, the intermediate energy prediction network $f_{\phi}(\mathbf{x}_t, t)$, the intermediate force prediction network ${g}_{\nu}(\mathbf{x}_t, t)$ and the force-field prediction network $\widetilde{g}_{\nu}(\mathbf{x}_t, t)$ used in Eq. \ref{['eq:interpolation_net']}. The invariant point attention, edge transition, and frame update modules similar to the corresponding structures in alphafold
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2