Design, Assessment, and Application of Machine Learning Potential Energy Surfaces
Valerii Andreichev, Sena Aydin, Kai Töpfer, Markus Meuwly, Luis Itza Vazquez-Salazar
TL;DR
The paper addresses the challenge of constructing and using machine-learned potential energy surfaces (ML-PES) for biomolecular and chemical systems by outlining a practical, iterative workflow that spans model selection, descriptors, data generation, validation, and MD integration. It compares kernel and neural network approaches, discusses fixed versus learnable descriptors and equivariant representations, and emphasizes data quality, uncertainty quantification, and active learning. Through two detailed case studies—the non-reactive Ala-Lys-Ala tripeptide and reactive proton transfer in DNA base pairs—it demonstrates how ML-PES can match or exceed traditional force fields in accuracy and enable advanced simulations, including transfer learning to higher levels of theory. The authors highlight open challenges such as scalable uncertainty quantification, efficient data selection, and seamless experimental integration, outlining a roadmap for broader adoption of ML-PES in chemistry and biophysics. Overall, the work provides concrete guidelines and demonstrations showing how ML-PES can extend the reach of molecular simulations to larger, more complex systems while maintaining high fidelity to quantum chemical references.
Abstract
Potential Energy Surfaces (PESs) are an indispensable tool to investigate, characterise and understand chemical and biological systems in the gas and condensed phases. Advances in Machine Learning (ML) methodologies have led to the development of Machine Learned Potential Energy Surfaces (ML-PES) which are now widely used to simulate such systems. The present work provides an overview of concepts, methodologies and recommendations for constructing and using ML-PESs. The choice of topics is focused on practical and recurrent issues to conceive and use such model. Application of the principles discussed are illustrated through two different systems of biomolecular importance: the non-reactive dynamics of the Alanine-Lysine-Alanine tripeptide in gas and solution phases, and double proton transfer reactions in DNA base pairs.
