ProTDyn: a foundation Protein language model for Thermodynamics and Dynamics generation
Yikai Liu, Haoyang Zheng, Lining Mao, Yanbin Wang, Ming Chen, Guang Lin
TL;DR
ProTDyn tackles the MD bottleneck by unifying thermodynamics and multi-timescale dynamics in a single transformer-based generative model that tokenizes conformations with a structure tokenizer and learns three objectives: $L_{thermo}$, $L_{dyn}$, and $L_{dynI}$ with $L_{ProTDyn}=\omega_1 L_{thermo}+\omega_2 L_{dyn}+\omega_3 L_{dynI}$. Empirical results show Boltzmann-consistent ensembles and accurate long-timescale dynamics, with strong generalization to unseen proteins and performance comparable to reference MD while enabling scalable generation. The framework supports exact likelihood evaluation and offers a path toward integrating physics-based energy functions and enforcing principles like detailed balance, advancing principled, physically grounded protein modeling. Overall, ProTDyn provides a scalable, transferable approach that bridges thermodynamics and dynamics within a single generative model, enabling efficient exploration of protein conformational landscapes across multiple timescales.
Abstract
Molecular dynamics (MD) simulation has long been the principal computational tool for exploring protein conformational landscapes and dynamics, but its application is limited by high computational cost. We present ProTDyn, a foundation protein language model that unifies conformational ensemble generation and multi-timescale dynamics modeling within a single framework. Unlike prior approaches that treat these tasks separately, ProTDyn allows flexible independent and identically distributed (i.i.d.) ensemble sampling and dynamic trajectory simulation. Across diverse protein systems, ProTDyn yields thermodynamically consistent ensembles, faithfully reproduces dynamical properties over multiple timescales, and generalizes to proteins beyond its training data. It offers a scalable and efficient alternative to conventional MD simulations.
