Table of Contents
Fetching ...

UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

Ziyang Yu, Wenbing Huang, Yang Liu

TL;DR

MD simulations face a trade-off between accuracy and efficiency. UniSim addresses this with a unified cross-domain atomic representation learned via multi-head pretraining, combined with a stochastic interpolant-based vector field to propagate dynamics over a long timestep $\tau$ and a force guidance kernel for environment-specific adaptation. The approach demonstrates transferability across small molecules, peptides, and proteins, achieving favorable distributional alignment to MD trajectories and improved validity, as shown on diverse datasets. This work advances efficient, physics-informed long-timescale biomolecular simulations with cross-domain generalization, enabling more scalable exploration of conformational landscapes.

Abstract

Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.

UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

TL;DR

MD simulations face a trade-off between accuracy and efficiency. UniSim addresses this with a unified cross-domain atomic representation learned via multi-head pretraining, combined with a stochastic interpolant-based vector field to propagate dynamics over a long timestep and a force guidance kernel for environment-specific adaptation. The approach demonstrates transferability across small molecules, peptides, and proteins, achieving favorable distributional alignment to MD trajectories and improved validity, as shown on diverse datasets. This work advances efficient, physics-informed long-timescale biomolecular simulations with cross-domain generalization, enabling more scalable exploration of conformational landscapes.

Abstract

Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.

Paper Structure

This paper contains 39 sections, 1 theorem, 26 equations, 8 figures, 9 tables.

Key Result

Proposition 3.1

Assume that marginals $q_t$ and $p_t$ are generated by $b(t,\Vec{{\mathbf{X}}}),\eta_z(t,\Vec{{\mathbf{X}}})$ and $b'(t,\Vec{{\mathbf{X}}}),\eta_z'(t,\Vec{{\mathbf{X}}})$ based on eq:interp_sde, respectively. Given the probability measure $\nu$ of data pairs satisfying $\nu(\Vec{{\mathbf{X}}}_0,\Vec

Figures (8)

  • Figure 1: UniSim enables time-coarsened dynamics simulations of small molecules, peptides, and proteins over a long timestep $\tau$.
  • Figure 2: Illustration for the overall workflow of UniSim. a. The unified atomic representation model $\varphi$ is pretrained on multi-domain 3D molecules, where data from different chemical environments are fed to the corresponding output head. b. Based on the stochastic interpolant framework, vector field models $v,\eta_z$ are trained on MD trajectories to learn the push forward from $\Vec{{\mathbf{X}}}_t$ to $\Vec{{\mathbf{X}}}_{t+\tau}$ with timestep $\tau$. c. To adapt to different chemical environements, additional networks $\Psi,\psi$ are trained to fit the intermediate forcefield $\nabla\varepsilon_t$, with other parameters frozen. d. Given an initial state, inference is performed by iteratively solving an SDE with the diffusion time $t$ from 0 to 1.
  • Figure 3: The visualization of comprehensive metrics on peptide 1i7u_C (upper) and 1ar8_0 (lower). The left column shows the joint distribution of pairwise distances. The middle column demonstrates the residue contact map, where data in the lower and upper triangle are obtained from UniSim and MD, respectively. The right column displays TIC-2D plots for the slowest two components, where contours indicate the kernel density estimated on MD trajectories and the generated conformations are shown in scatter.
  • Figure 4: TIC and TIC-2D plots of UniSim (left) and UniSim/g (right) on a. Ac-Ala3-NHMe and b. DHA. The first row displays the free energy projection on TIC 1, and the second row demonstrates TIC plots for the slowest two components.
  • Figure 5: TIC-2D plots of the first 200 generated conformations for a. 3bn0_A and b. 4b6i_D. Contours indicate the kernel density estimated on MD trajectories, the generated conformations are shown in scatter, and the blue dashed arrows represent the order in which the conformations are generated.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 3.1