Table of Contents
Fetching ...

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

Aniketh Iyengar, Jiaqi Han, Pengwei Sun, Mingjian Jiang, Jianwen Xie, Stefano Ermon

Abstract

Generating molecular dynamics (MD) trajectories using deep generative models has attracted increasing attention, yet remains inherently challenging due to the limited availability of MD data and the complexities involved in modeling high-dimensional MD distributions. To overcome these challenges, we propose a novel framework that leverages structure pretraining for MD trajectory generation. Specifically, we first train a diffusion-based structure generation model on a large-scale conformer dataset, on top of which we introduce an interpolator module trained on MD trajectory data, designed to enforce temporal consistency among generated structures. Our approach effectively harnesses abundant structural data to mitigate the scarcity of MD trajectory data and effectively decomposes the intricate MD modeling task into two manageable subproblems: structural generation and temporal alignment. We comprehensively evaluate our method on the QM9 and DRUGS small-molecule datasets across unconditional generation, forward simulation, and interpolation tasks, and further extend our framework and analysis to tetrapeptide and protein monomer systems. Experimental results confirm that our approach excels in generating chemically realistic MD trajectories, as evidenced by remarkable improvements of accuracy in geometric, dynamical, and energetic measurements.

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

Abstract

Generating molecular dynamics (MD) trajectories using deep generative models has attracted increasing attention, yet remains inherently challenging due to the limited availability of MD data and the complexities involved in modeling high-dimensional MD distributions. To overcome these challenges, we propose a novel framework that leverages structure pretraining for MD trajectory generation. Specifically, we first train a diffusion-based structure generation model on a large-scale conformer dataset, on top of which we introduce an interpolator module trained on MD trajectory data, designed to enforce temporal consistency among generated structures. Our approach effectively harnesses abundant structural data to mitigate the scarcity of MD trajectory data and effectively decomposes the intricate MD modeling task into two manageable subproblems: structural generation and temporal alignment. We comprehensively evaluate our method on the QM9 and DRUGS small-molecule datasets across unconditional generation, forward simulation, and interpolation tasks, and further extend our framework and analysis to tetrapeptide and protein monomer systems. Experimental results confirm that our approach excels in generating chemically realistic MD trajectories, as evidenced by remarkable improvements of accuracy in geometric, dynamical, and energetic measurements.

Paper Structure

This paper contains 18 sections, 1 theorem, 5 equations, 6 figures, 2 tables.

Key Result

Theorem 4.1

Suppose $\bm\epsilon^\mathrm{cf}_\theta$ perfectly models $p^\mathrm{cf}({\mathbf{x}})$ and $\bm\epsilon^\mathrm{md}_{\theta,\phi}$ perfectly models $p^\mathrm{md}({\mathbf{x}}^{[T]})$, then the interpolation in Eq. eq:interpolator implicitly induces the distribution $\tilde{p}^\mathrm{md}({\mathbf{

Figures (6)

  • Figure 1: The overall two-stage framework of EGInterpolator. Structure pretraining: We first pretrain a conformer model $\bm\epsilon_\theta$ on a large-scale conformer dataset. MD fine-tuning: The model is then combined with additional temporal interpolator ${\mathbf{s}}_\phi^\mathrm{tp}$ to approach the MD distribution $p^\mathrm{md}({\mathbf{x}}^{[T]})$.
  • Figure 2: Cascaded temporal interpolator block.
  • Figure 3: (A) reports performance of BasicES with borrowed numbers from xu2022geodiff on SOTA baselines; (B) Example conformers from BasicES on both QM9 & Drugs
  • Figure 4: (A) Bond length and (B) torsion angle distributions from reference (red), our generations (green), and GeoTDM (blue). MSM occupancies from reference versus (C) our generations and (D) MD oracles. Autocorrelations of torsion angles for an example molecule from (E) reference, (F) our generations, and (G) GeoTDM. Gray dashed line marks the 1/e decorrelation threshold.
  • Figure 5: (A) Reference free energy surface along the top two TICA components. (B) Generated interpolation trajectory projected onto the reference surface (red = start, orange = end). Surface is colored by metastate assignment. (C) Key frames from intermediate metastates. (D) Statistics comparing JSD, valid path rate, average path probability, and valid path probability for generated trajectories and replicate MD oracles.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 4.1