Machine Learning Coarse-Grained Potentials of Protein Thermodynamics
Maciej Majewski, Adrià Pérez, Philipp Thölke, Stefan Doerr, Nicholas E. Charron, Toni Giorgino, Brooke E. Husic, Cecilia Clementi, Frank Noé, Gianni De Fabritiis
TL;DR
The paper addresses predicting protein dynamics by learning thermodynamically consistent coarse-grained potentials using neural network potentials (NNP) trained with force-matching on a large all-atom MD dataset. It employs an alpha-carbon coarse-grained representation and builds a multi-protein training set from approximately 9 ms of unbiased MD across twelve proteins with diverse secondary structures, training both protein-specific and a general multi-protein model. The results show that CG simulations reproduce native and metastable states while accelerating dynamics by more than three orders of magnitude, with the general model achieving native structures for most targets and mutational cases, albeit with limitations for beta-sheet proteins. The work demonstrates the potential of transferable ML CG potentials for simulating protein thermodynamics and dynamics, and highlights data demands and current limitations in extrapolation and beta-sheet handling, pointing to future paths toward general-use CG force fields.
Abstract
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
