Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
Aditya Sengar, Jiying Zhang, Pierre Vandergheynst, Patrick Barth
TL;DR
<3-5 sentence high-level summary> The paper tackles the challenge of simulating long-timescale protein dynamics by shifting from brute-force MD to a representation-first approach that uses a fixed LD-FPG encoder–decoder to map all-atom configurations to a latent space. It introduces the Graph Latent Dynamics Propagator (GLDP), which swaps three latent-space propagators—autoregressive neural networks, Koopman-based linear operators, and score-guided Langevin dynamics—within a unified framework, and benchmarks them from small peptides to complex GPCRs. The results show a clear trade-off: the autoregressive NN yields the most robust long-rollout and backbone fidelity, the Langevin propagator best captures fine-grained side-chain thermodynamics, and the Koopman baseline provides a simple, interpretable performance with more rigid dynamics; notably, GLDP recovers the GPCR A2AR activation surface. These findings illuminate how propagator choice shapes thermodynamic fidelity and kinetics in latent-space protein dynamics and point toward hybrid strategies for reliable, system-specific surrogates.
Abstract
Simulating the long-timescale dynamics of biomolecules is a central challenge in computational science. While enhanced sampling methods can accelerate these simulations, they rely on pre-defined collective variables that are often difficult to identify, restricting their ability to model complex switching mechanisms between metastable states. A recent generative model, LD-FPG, demonstrated that this problem could be bypassed by learning to sample the static equilibrium ensemble as all-atom deformations from a reference structure, establishing a powerful method for all-atom ensemble generation. However, while this approach successfully captures a system's probable conformations, it does not model the temporal evolution between them. We introduce the Graph Latent Dynamics Propagator (GLDP), a modular component for simulating dynamics within the learned latent space of LD-FPG. We then compare three classes of propagators: (i) score-guided Langevin dynamics, (ii) Koopman-based linear operators, and (iii) autoregressive neural networks. Within a unified encoder-propagator-decoder framework, we evaluate long-horizon stability, backbone and side-chain ensemble fidelity, and temporal kinetics via TICA. Benchmarks on systems ranging from small peptides to mixed-topology proteins and large GPCRs reveal that autoregressive neural networks deliver the most robust long rollouts and coherent physical timescales; score-guided Langevin best recovers side-chain thermodynamics when the score is well learned; and Koopman provides an interpretable, lightweight baseline that tends to damp fluctuations. These results clarify the trade-offs among propagators and offer practical guidance for latent-space simulators of all-atom protein dynamics.
