Table of Contents
Fetching ...

Design, Assessment, and Application of Machine Learning Potential Energy Surfaces

Valerii Andreichev, Sena Aydin, Kai Töpfer, Markus Meuwly, Luis Itza Vazquez-Salazar

TL;DR

The paper addresses the challenge of constructing and using machine-learned potential energy surfaces (ML-PES) for biomolecular and chemical systems by outlining a practical, iterative workflow that spans model selection, descriptors, data generation, validation, and MD integration. It compares kernel and neural network approaches, discusses fixed versus learnable descriptors and equivariant representations, and emphasizes data quality, uncertainty quantification, and active learning. Through two detailed case studies—the non-reactive Ala-Lys-Ala tripeptide and reactive proton transfer in DNA base pairs—it demonstrates how ML-PES can match or exceed traditional force fields in accuracy and enable advanced simulations, including transfer learning to higher levels of theory. The authors highlight open challenges such as scalable uncertainty quantification, efficient data selection, and seamless experimental integration, outlining a roadmap for broader adoption of ML-PES in chemistry and biophysics. Overall, the work provides concrete guidelines and demonstrations showing how ML-PES can extend the reach of molecular simulations to larger, more complex systems while maintaining high fidelity to quantum chemical references.

Abstract

Potential Energy Surfaces (PESs) are an indispensable tool to investigate, characterise and understand chemical and biological systems in the gas and condensed phases. Advances in Machine Learning (ML) methodologies have led to the development of Machine Learned Potential Energy Surfaces (ML-PES) which are now widely used to simulate such systems. The present work provides an overview of concepts, methodologies and recommendations for constructing and using ML-PESs. The choice of topics is focused on practical and recurrent issues to conceive and use such model. Application of the principles discussed are illustrated through two different systems of biomolecular importance: the non-reactive dynamics of the Alanine-Lysine-Alanine tripeptide in gas and solution phases, and double proton transfer reactions in DNA base pairs.

Design, Assessment, and Application of Machine Learning Potential Energy Surfaces

TL;DR

The paper addresses the challenge of constructing and using machine-learned potential energy surfaces (ML-PES) for biomolecular and chemical systems by outlining a practical, iterative workflow that spans model selection, descriptors, data generation, validation, and MD integration. It compares kernel and neural network approaches, discusses fixed versus learnable descriptors and equivariant representations, and emphasizes data quality, uncertainty quantification, and active learning. Through two detailed case studies—the non-reactive Ala-Lys-Ala tripeptide and reactive proton transfer in DNA base pairs—it demonstrates how ML-PES can match or exceed traditional force fields in accuracy and enable advanced simulations, including transfer learning to higher levels of theory. The authors highlight open challenges such as scalable uncertainty quantification, efficient data selection, and seamless experimental integration, outlining a roadmap for broader adoption of ML-PES in chemistry and biophysics. Overall, the work provides concrete guidelines and demonstrations showing how ML-PES can extend the reach of molecular simulations to larger, more complex systems while maintaining high fidelity to quantum chemical references.

Abstract

Potential Energy Surfaces (PESs) are an indispensable tool to investigate, characterise and understand chemical and biological systems in the gas and condensed phases. Advances in Machine Learning (ML) methodologies have led to the development of Machine Learned Potential Energy Surfaces (ML-PES) which are now widely used to simulate such systems. The present work provides an overview of concepts, methodologies and recommendations for constructing and using ML-PESs. The choice of topics is focused on practical and recurrent issues to conceive and use such model. Application of the principles discussed are illustrated through two different systems of biomolecular importance: the non-reactive dynamics of the Alanine-Lysine-Alanine tripeptide in gas and solution phases, and double proton transfer reactions in DNA base pairs.

Paper Structure

This paper contains 25 sections, 21 figures.

Figures (21)

  • Figure 1: Construction of a ML-PES The process begins by defining the problem and selecting an appropriate ML model. This is followed by an iterative cycle of data generation (sampling), cleaning, training, validation, and refinement. Once validated, the model can be employed for molecular simulations.
  • Figure 2: Different types of neural networks Panel A: Models with fixed descriptors represent molecules using predefined two- and three-body functions. Separate multilayer perceptrons predict atomic energies, which are summed to obtain the total molecular energy. Panel B: Graph neural network (GNN) models represent molecules as graphs of atoms (nodes) and bonds (edges). Each atom in the molecule is described with an initial embedding vector. Atomic embeddings are iteratively updated through message passing and passed to a readout function to predict atomic energies. The lower panels illustrate symmetry conservation: invariant models preserve scalar properties (e.g., energy) under rotation, while equivariant models additionally conserve vectorial quantities (e.g., forces, dipoles).
  • Figure 3: Effect of different sampling methods for the dialanine peptide. Panel A shows sampling by molecular dynamics at 500 K. Panel B corresponds to normal mode scanning. Panel C corresponds to metadynamics sampling at 500 K over the angles $\phi$ and $\psi$. For details about the calculation setup, look at the main text. In all cases, the calculations were performed at MP2/6-31G** using ORCAORCA5
  • Figure 4: Types of ML/MM embeddings. The central region (blue) represents the subsystem described using machine-learning (ML) methodologies, surrounded by the molecular-mechanics (MM) environment (light red). Red spheres denote solvent molecules modelled as point charges. Panel A: mechanical embedding where the ML region interacts with the MM environment only by electrostatic and van der Waals interactions; the corresponding Hamiltonian for the ML–MM coupling is shown below. Panel B: electrostatic embedding for which the ML region is polarised by the MM point charges, as indicated by the electronic density interacting with the MM electrostatic field. The corresponding Hamiltonian is shown below. Panel C: schematic representation of more elaborate embeddings, such as polarizable or adaptive schemes. Notice in this case, the solvent could enter the ML region and/or the ML region also polarize the MM region.
  • Figure 5: General performance of the ML-PES for AKA tripeptide Correlation plot of train (A) and test (B) set of SETG5 reference energies and predicted NN energies. RMSE(E) and MAE(E) for the training and test sets of SETG5: For the training set, RMSE(E) is 0.14 and MAE(E) 0.1 kcal/mol; for the test set, RMSE(E) is 1.31 and MAE(E) is 0.5 kcal/mol. Correlation plot of test set of SETS1 (C) and SETS2 (D) reference energies and predicted NN energies. RMSE(E) and MAE(E) for the test set of SETS1 are 1.32 and 0.5 kcal/mol, respectively. For the SETS2 test set, RMSE(E) is 1.41 and MAE(E) is 0.52 kcal/mol.
  • ...and 16 more figures