Table of Contents
Fetching ...

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Jiarui Lu, Bozitao Zhong, Zuobai Zhang, Jian Tang

TL;DR

Inspired by simulated annealing, Str2Str is proposed, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property and can be orders of magnitude faster compared to long MD simulations.

Abstract

The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics. However, the high-energy barrier of the force fields can hamper the exploration of both methods by the rare event, resulting in inadequately sampled ensemble without exhaustive running. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training, which suffers from high data acquisition cost and poor generalizability. Inspired by simulated annealing, we propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property. Our method leverages an amortized denoising score matching objective trained on general crystal structures and has no reliance on simulation data during both training and inference. Experimental results across several benchmarking protein systems demonstrate that Str2Str outperforms previous state-of-the-art generative structure prediction models and can be orders of magnitude faster compared to long MD simulations. Our open-source implementation is available at https://github.com/lujiarui/Str2Str

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

TL;DR

Inspired by simulated annealing, Str2Str is proposed, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property and can be orders of magnitude faster compared to long MD simulations.

Abstract

The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics. However, the high-energy barrier of the force fields can hamper the exploration of both methods by the rare event, resulting in inadequately sampled ensemble without exhaustive running. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training, which suffers from high data acquisition cost and poor generalizability. Inspired by simulated annealing, we propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property. Our method leverages an amortized denoising score matching objective trained on general crystal structures and has no reliance on simulation data during both training and inference. Experimental results across several benchmarking protein systems demonstrate that Str2Str outperforms previous state-of-the-art generative structure prediction models and can be orders of magnitude faster compared to long MD simulations. Our open-source implementation is available at https://github.com/lujiarui/Str2Str
Paper Structure (63 sections, 1 theorem, 19 equations, 14 figures, 11 tables, 3 algorithms)

This paper contains 63 sections, 1 theorem, 19 equations, 14 figures, 11 tables, 3 algorithms.

Key Result

Proposition 1

Let ${\mathbf{x}} \sim p_X({\mathbf{x}}|{\mathbf{x}}_0)$ be the conformation sampled from the process defined in Section m1. If the frame score functions $\nabla_{{\mathbf{T}}_t} \log p_t( {\mathbf{T}}_t)$ are equivariant to global roto-translations, then ${\mathbf{x}}_0 \to {\mathbf{x}}$ assumes ro

Figures (14)

  • Figure 1: Illustration of traditional sampling methods with proposed Str2Str. In (d), the energy landscape is made transparent to indicate that, in contrast to traditional cases, Str2Str is agnostic to and thus not relying on the energy landscape but guided by the learned score functions.
  • Figure 2: Illustration of forward-backward process. Given an input structure (example as Trp-cage, PDB entry: 2JOF), replicas are fed to the forward (perturb) diffusion, which independently perturbs each replica until the dynamic-transition time $T_\delta$; then the reverse (anneal) process will yield the sampled conformations. The sequence-to-structure task can be well solved by any existing folding module such as ESMFold.
  • Figure 3: Illustration of $l$-th layer of DenoisingIPA, where $\parallel$ denotes the tensor Concat and $+$ means tensor Add operation. The multi-head attention is the transformer self-attention vaswani2017attention. The initial single representations $\{{\bm{s}}_i\}^0$ are constructed from the positional encoding of residues and the time encoding of denoising step. Here, single representations $\{{\bm{s}}_i\}^l$ and backbone frames $\{{\textnormal{T}}_i\}^l$ are updated similar to the structure module in jumper2021highly, while pair representations $\{{\bm{z}}_{ij}\}^l$ are updated according to Eq. (\ref{['eq:pairupdate']}).
  • Figure 4: Contact map of Trp-cage (visualized in Figure \ref{['fig:fb']}) of each model with MD reference.
  • Figure 5: Visualization of TICA plots for BPTI conformations sampled by each model with MD references. The kinetic clusters are colored red. In each subfigure, totally 1,000 samples were scattered in the 2D space. Note that most of the points are outside the target region for idpGAN.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Proposition 1: Equivariance of Str2Str