Table of Contents
Fetching ...

Generative Modeling of Molecular Dynamics Trajectories

Bowen Jing, Hannes Stärk, Tommi Jaakkola, Bonnie Berger

TL;DR

This work introduces generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data and illustrates how generative modeling can unlock value from MD data towards diverse downstream tasks that are not straightforward to address with existing methods or even MD itself.

Abstract

Molecular dynamics (MD) is a powerful technique for studying microscopic phenomena, but its computational cost has driven significant interest in the development of deep learning-based surrogate models. We introduce generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data. By conditioning on appropriately chosen frames of the trajectory, we show such generative models can be adapted to diverse tasks such as forward simulation, transition path sampling, and trajectory upsampling. By alternatively conditioning on part of the molecular system and inpainting the rest, we also demonstrate the first steps towards dynamics-conditioned molecular design. We validate the full set of these capabilities on tetrapeptide simulations and show that our model can produce reasonable ensembles of protein monomers. Altogether, our work illustrates how generative modeling can unlock value from MD data towards diverse downstream tasks that are not straightforward to address with existing methods or even MD itself. Code is available at https://github.com/bjing2016/mdgen.

Generative Modeling of Molecular Dynamics Trajectories

TL;DR

This work introduces generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data and illustrates how generative modeling can unlock value from MD data towards diverse downstream tasks that are not straightforward to address with existing methods or even MD itself.

Abstract

Molecular dynamics (MD) is a powerful technique for studying microscopic phenomena, but its computational cost has driven significant interest in the development of deep learning-based surrogate models. We introduce generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data. By conditioning on appropriately chosen frames of the trajectory, we show such generative models can be adapted to diverse tasks such as forward simulation, transition path sampling, and trajectory upsampling. By alternatively conditioning on part of the molecular system and inpainting the rest, we also demonstrate the first steps towards dynamics-conditioned molecular design. We validate the full set of these capabilities on tetrapeptide simulations and show that our model can produce reasonable ensembles of protein monomers. Altogether, our work illustrates how generative modeling can unlock value from MD data towards diverse downstream tasks that are not straightforward to address with existing methods or even MD itself. Code is available at https://github.com/bjing2016/mdgen.
Paper Structure (32 sections, 11 equations, 13 figures, 4 tables, 3 algorithms)

This paper contains 32 sections, 11 equations, 13 figures, 4 tables, 3 algorithms.

Figures (13)

  • Figure 1: (Left) Tasks: generative modeling of MD trajectories addresses several tasks by conditioning on different parts of a trajectory. (Right) Method: We tokenize trajectories of $T$ frames and $L$ residues into an $(T\times L)$-array of SE(3)-invariant tokens encoding roto-translation offsets from key frames and torsion angles. Using stochastic interpolants, we generate arrays of such tokens from Gaussian noise.
  • Figure 2: Forward simulation evaluations on test peptides. (A) Torsion angle distributions for the six backbone torsion angles from MD trajectories (orange) and sampled trajectories (blue). (B, C) Free energy surfaces along the top two TICA components computed from backbone and sidechain torsion angles. (D) Markov State Model occupancies computed from MD trajectories versus sampled trajectories, pooled across all test peptides ($n=1000$ states total). (E) Wall-clock decorrelation times of the first TICA component under MD versus our model rollouts. (F) Relaxation times of all torsion angles, pooled across all test peptides (508 backbone and 722 sidechain torsions in total) computed from MD versus sampled trajectories. (G) Torsion angles in the tetrapeptide AAAA colored by the decorrelation time computed from MD (top) and from rollout trajectories (bottom).
  • Figure 3: Transition path sampling results. (Top) Intermediate states of one of the 1-nanosecond interpolated trajectories between two metastable states for the test peptide IPGD. (Bottom Left) The corresponding trajectory on the 2D free energy surface of the top two TICA components (more examples in Figure \ref{['fig:tps_appendix']}). (Bottom Right) Statistics averaged over 100 test peptides and 1000 paths for each of them. Shown are JSD, fraction of drawn paths that are valid transition paths, and average path likelihood of our discretized transitions under the reference MSM compared to discrete transitions drawn from the reference MSM or alternative MSMs built from replica simulations of varying lengths.
  • Figure 3: Sequence recovery for the inner two peptides when conditioning on the partial trajectory (MDGen), the two terminal frames (DynMPNN), or a single frame (S-MPNN).
  • Figure 4: Recovery of fast dynamics via trajectory upsampling for peptide GTLM. (Left) Autocorrelations of each torsion angle from () the original 100 fs-timestep trajectory, ($\bullet$) the subsampled 10 ns-timestep trajectory, and (...) the reconstructed 100 fs-timestep trajectory (all length 100 ns). (Right) Dynamical content as a function of timescale from the upsampled vs. ground truth trajectories, stacked for all torsion angles (same color scheme). The subsampled trajectory contains only the shaded region and our model recovers the unshaded region. Further examples in Figure \ref{['fig:upsampling_appendix']}.
  • ...and 8 more figures