Learning Potential Energy Surfaces of Hydrogen Atom Transfer Reactions in Peptides
Marlen Neubert, Patrick Reiser, Frauke Gräter, Pascal Friederich
TL;DR
This work tackles the challenge of simulating hydrogen atom transfer (HAT) in peptides with quantum-accurate potential energy surfaces by building large peptide-focused datasets and benchmarking three graph neural networks (SchNet, Allegro, MACE). By learning full PESs and inferring reaction barriers indirectly from energy predictions, the authors show that MACE provides the best accuracy, achieving a barrier MAE of $1.13$ kcal/mol on out-of-distribution DFT data, and enabling stable MD simulations that capture HAT events. They validate the model on collagen 1 environments, demonstrating robust barrier shapes and local motif transferability, and discuss scaling, transferability, and strategies for improvement via active learning and TS searches. The study presents a generalizable pipeline for applying ML PESs to complex biomolecular reactivity, with potential for barrier-driven kinetics and large-scale biomolecular simulations.
Abstract
Hydrogen atom transfer (HAT) reactions are essential in many biological processes, such as radical migration in damaged proteins, but their mechanistic pathways remain incompletely understood. Simulating HAT is challenging due to the need for quantum chemical accuracy at biologically relevant scales; thus, neither classical force fields nor DFT-based molecular dynamics are applicable. Machine-learned potentials offer an alternative, able to learn potential energy surfaces (PESs) with near-quantum accuracy. However, training these models to generalize across diverse HAT configurations, especially at radical positions in proteins, requires tailored data generation and careful model selection. Here, we systematically generate HAT configurations in peptides to build large datasets using semiempirical methods and DFT. We benchmark three graph neural network architectures (SchNet, Allegro, and MACE) on their ability to learn HAT PESs and indirectly predict reaction barriers from energy predictions. MACE consistently outperforms the others in energy, force, and barrier prediction, achieving a mean absolute error of 1.13 kcal/mol on out-of-distribution DFT barrier predictions. Using molecular dynamics, we show our MACE potential is stable, reactive, and generalizes beyond training data to model HAT barriers in collagen I. This accuracy enables integration of ML potentials into large-scale collagen simulations to compute reaction rates from predicted barriers, advancing mechanistic understanding of HAT and radical migration in peptides. We analyze scaling laws, model transferability, and cost-performance trade-offs, and outline strategies for improvement by combining ML potentials with transition state search algorithms and active learning. Our approach is generalizable to other biomolecular systems, enabling quantum-accurate simulations of chemical reactivity in complex environments.
