Table of Contents
Fetching ...

Fusing Neural and Physical: Augment Protein Conformation Sampling with Tractable Simulations

Jiarui Lu, Zuobai Zhang, Bozitao Zhong, Chence Shi, Jian Tang

TL;DR

The paper tackles the costly problem of sampling protein conformations by marrying a zero-shot diffusion sampler (Str2Str) with tractable, parallel short MD simulations. By seeding from the pre-trained sampler, running MD in parallel for each seed, and then fine-tuning the sampler to the target protein (Str2Str-NE and Str2Str-FT), the authors achieve improved ensemble quality within a practical computational budget. The approach demonstrates state-of-the-art performance across multiple metrics on fast-folding proteins, highlighting the value of integrating physics-based refinement with neural samplers. This work provides a scalable framework for energy-aware conformational sampling, bridging rapid neural proposals with local MD equilibration to produce more Boltzmann-like ensembles within affordable compute.

Abstract

The protein dynamics are common and important for their biological functions and properties, the study of which usually involves time-consuming molecular dynamics (MD) simulations in silico. Recently, generative models has been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster and without requiring any simulation data (a "zero-shot" inference). However, being agnostic of the underlying energy landscape, the accuracy of such generative model may still be limited. In this work, we explore the few-shot setting of such pre-trained generative sampler which incorporates MD simulations in a tractable manner. Specifically, given a target protein of interest, we first acquire some seeding conformations from the pre-trained sampler followed by a number of physical simulations in parallel starting from these seeding samples. Then we fine-tuned the generative model using the simulation trajectories above to become a target-specific sampler. Experimental results demonstrated the superior performance of such few-shot conformation sampler at a tractable computational cost.

Fusing Neural and Physical: Augment Protein Conformation Sampling with Tractable Simulations

TL;DR

The paper tackles the costly problem of sampling protein conformations by marrying a zero-shot diffusion sampler (Str2Str) with tractable, parallel short MD simulations. By seeding from the pre-trained sampler, running MD in parallel for each seed, and then fine-tuning the sampler to the target protein (Str2Str-NE and Str2Str-FT), the authors achieve improved ensemble quality within a practical computational budget. The approach demonstrates state-of-the-art performance across multiple metrics on fast-folding proteins, highlighting the value of integrating physics-based refinement with neural samplers. This work provides a scalable framework for energy-aware conformational sampling, bridging rapid neural proposals with local MD equilibration to produce more Boltzmann-like ensembles within affordable compute.

Abstract

The protein dynamics are common and important for their biological functions and properties, the study of which usually involves time-consuming molecular dynamics (MD) simulations in silico. Recently, generative models has been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster and without requiring any simulation data (a "zero-shot" inference). However, being agnostic of the underlying energy landscape, the accuracy of such generative model may still be limited. In this work, we explore the few-shot setting of such pre-trained generative sampler which incorporates MD simulations in a tractable manner. Specifically, given a target protein of interest, we first acquire some seeding conformations from the pre-trained sampler followed by a number of physical simulations in parallel starting from these seeding samples. Then we fine-tuned the generative model using the simulation trajectories above to become a target-specific sampler. Experimental results demonstrated the superior performance of such few-shot conformation sampler at a tractable computational cost.
Paper Structure (29 sections, 3 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustrative diagram of fine-tuning the pre-trained diffusion sampler Str2Str lu2024str2str. Firstly, initial conformation samples of the target protein are generated from a pre-trained network $\mathcal{S}_\theta$ parameterized by $\theta$, followed by parallel MD simulations respectively for each sample. The production trajectories are leveraged to make a target-specific sampler $\mathcal{S}_{\theta^*}$ via fine-tuning.
  • Figure 2: Illustration of the conformation sampling scenario of Str2Str. In (a), sampling is performed in two steps to obtain independent samples of the target protein, where the energy landscape (or information of the force field) is unknown and colored in gray. In (b), a hypothetical zoom-in neighborhood of a sample is shown. Due to the complex conformation landscape, the samples directly generated by Str2Str are probably not potential energy-optimal. Short MD simulation can be run to obtain locally equilibrated samples which can be used to fine-tune the pre-trained sampler.
  • Figure 3: The runtime profile of different samplers across each fast folding target used in this study.