Table of Contents
Fetching ...

Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space

Aditya Sengar, Jiying Zhang, Pierre Vandergheynst, Patrick Barth

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of simulating long-timescale protein dynamics by shifting from brute-force MD to a representation-first approach that uses a fixed LD-FPG encoder–decoder to map all-atom configurations to a latent space. It introduces the Graph Latent Dynamics Propagator (GLDP), which swaps three latent-space propagators—autoregressive neural networks, Koopman-based linear operators, and score-guided Langevin dynamics—within a unified framework, and benchmarks them from small peptides to complex GPCRs. The results show a clear trade-off: the autoregressive NN yields the most robust long-rollout and backbone fidelity, the Langevin propagator best captures fine-grained side-chain thermodynamics, and the Koopman baseline provides a simple, interpretable performance with more rigid dynamics; notably, GLDP recovers the GPCR A2AR activation surface. These findings illuminate how propagator choice shapes thermodynamic fidelity and kinetics in latent-space protein dynamics and point toward hybrid strategies for reliable, system-specific surrogates.

Abstract

Simulating the long-timescale dynamics of biomolecules is a central challenge in computational science. While enhanced sampling methods can accelerate these simulations, they rely on pre-defined collective variables that are often difficult to identify, restricting their ability to model complex switching mechanisms between metastable states. A recent generative model, LD-FPG, demonstrated that this problem could be bypassed by learning to sample the static equilibrium ensemble as all-atom deformations from a reference structure, establishing a powerful method for all-atom ensemble generation. However, while this approach successfully captures a system's probable conformations, it does not model the temporal evolution between them. We introduce the Graph Latent Dynamics Propagator (GLDP), a modular component for simulating dynamics within the learned latent space of LD-FPG. We then compare three classes of propagators: (i) score-guided Langevin dynamics, (ii) Koopman-based linear operators, and (iii) autoregressive neural networks. Within a unified encoder-propagator-decoder framework, we evaluate long-horizon stability, backbone and side-chain ensemble fidelity, and temporal kinetics via TICA. Benchmarks on systems ranging from small peptides to mixed-topology proteins and large GPCRs reveal that autoregressive neural networks deliver the most robust long rollouts and coherent physical timescales; score-guided Langevin best recovers side-chain thermodynamics when the score is well learned; and Koopman provides an interpretable, lightweight baseline that tends to damp fluctuations. These results clarify the trade-offs among propagators and offer practical guidance for latent-space simulators of all-atom protein dynamics.

Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of simulating long-timescale protein dynamics by shifting from brute-force MD to a representation-first approach that uses a fixed LD-FPG encoder–decoder to map all-atom configurations to a latent space. It introduces the Graph Latent Dynamics Propagator (GLDP), which swaps three latent-space propagators—autoregressive neural networks, Koopman-based linear operators, and score-guided Langevin dynamics—within a unified framework, and benchmarks them from small peptides to complex GPCRs. The results show a clear trade-off: the autoregressive NN yields the most robust long-rollout and backbone fidelity, the Langevin propagator best captures fine-grained side-chain thermodynamics, and the Koopman baseline provides a simple, interpretable performance with more rigid dynamics; notably, GLDP recovers the GPCR A2AR activation surface. These findings illuminate how propagator choice shapes thermodynamic fidelity and kinetics in latent-space protein dynamics and point toward hybrid strategies for reliable, system-specific surrogates.

Abstract

Simulating the long-timescale dynamics of biomolecules is a central challenge in computational science. While enhanced sampling methods can accelerate these simulations, they rely on pre-defined collective variables that are often difficult to identify, restricting their ability to model complex switching mechanisms between metastable states. A recent generative model, LD-FPG, demonstrated that this problem could be bypassed by learning to sample the static equilibrium ensemble as all-atom deformations from a reference structure, establishing a powerful method for all-atom ensemble generation. However, while this approach successfully captures a system's probable conformations, it does not model the temporal evolution between them. We introduce the Graph Latent Dynamics Propagator (GLDP), a modular component for simulating dynamics within the learned latent space of LD-FPG. We then compare three classes of propagators: (i) score-guided Langevin dynamics, (ii) Koopman-based linear operators, and (iii) autoregressive neural networks. Within a unified encoder-propagator-decoder framework, we evaluate long-horizon stability, backbone and side-chain ensemble fidelity, and temporal kinetics via TICA. Benchmarks on systems ranging from small peptides to mixed-topology proteins and large GPCRs reveal that autoregressive neural networks deliver the most robust long rollouts and coherent physical timescales; score-guided Langevin best recovers side-chain thermodynamics when the score is well learned; and Koopman provides an interpretable, lightweight baseline that tends to damp fluctuations. These results clarify the trade-offs among propagators and offer practical guidance for latent-space simulators of all-atom protein dynamics.

Paper Structure

This paper contains 71 sections, 17 equations, 8 figures, 15 tables, 3 algorithms.

Figures (8)

  • Figure 1: Framework overview. A pre-trained LD-FPG encoder (ChebNet; left) maps all-atom coordinates $X(t)$ to a pooled latent $z(t)$. Within the fixed LD-FPG latent, GLDP advances the state via one of three propagators (red box): (a) score-guided Langevin using the LD-FPG denoiser to estimate $s_\theta(z,\tau)=\nabla_z \log p_\tau(z)$ at a fixed low-noise level; (b) an autoregressive NN $z_{t+1}=f_\theta(z_t)$; and (c) a Koopman linear operator $z_{t+1}=A z_t$. The frozen LD-FPG decoder (right) maps the latent trajectory back to all-atom structures $\hat{X}(t+\Delta t)$.
  • Figure 2: Stability over long rollouts. RMSD and lDDT versus frame index for (a) alanine dipeptide and (b) A1AR. We define failure time as the first frame whose lDDT (relative to the initial frame) drops below 0.65. On A1AR, the autoregressive NN remains stable for the entire 10,000-frame horizon (no failure), while Langevin and Koopman fail earlier; on alanine dipeptide, Koopman and NN persist for thousands of frames whereas Langevin fails early.
  • Figure 3: Ensemble fidelity in dihedral space. Free-energy maps for backbone $(\phi,\psi)$ and, for A1AR, side-chain $(\chi_1,\chi_2)$. Alanine dipeptide (left) shows canonical basins recovered by all methods; the Autoregressive NN aligns best with the reference basin shapes. For A1AR (middle/right), the NN most closely matches backbone structure, while the score-guided Langevin propagator recovers the sharpest rotamer bands for side-chains.
  • Figure 4: Functional free–energy surface for A2AR.(Top Left) Reference MD free energy heatmap projected onto TM3–6 and TM3–7 distances, showing the transition valley. (Center & Bottom-Left) Generated ensembles for the Autoregressive NN (cyan), Langevin (orange), and Koopman (purple) propagators, plotted as contours over the reference background (grey). (Right Column) 1D free energy profiles projected along the TM3–7 (top) and TM3–6 (bottom) reaction coordinates.
  • Figure 5: Representative structural snapshots from latent rollouts.Top: Alanine–dipeptide conformers sampled from a Koopman rollout show backbone dihedral changes across frames. Bottom: A2AR snapshots from an Langevin rollout; the dashed circle highlights the intracellular end of TM6, which moves outward over time—a hallmark of GPCR activation.
  • ...and 3 more figures