Table of Contents
Fetching ...

Learning Potential Energy Surfaces of Hydrogen Atom Transfer Reactions in Peptides

Marlen Neubert, Patrick Reiser, Frauke Gräter, Pascal Friederich

TL;DR

This work tackles the challenge of simulating hydrogen atom transfer (HAT) in peptides with quantum-accurate potential energy surfaces by building large peptide-focused datasets and benchmarking three graph neural networks (SchNet, Allegro, MACE). By learning full PESs and inferring reaction barriers indirectly from energy predictions, the authors show that MACE provides the best accuracy, achieving a barrier MAE of $1.13$ kcal/mol on out-of-distribution DFT data, and enabling stable MD simulations that capture HAT events. They validate the model on collagen 1 environments, demonstrating robust barrier shapes and local motif transferability, and discuss scaling, transferability, and strategies for improvement via active learning and TS searches. The study presents a generalizable pipeline for applying ML PESs to complex biomolecular reactivity, with potential for barrier-driven kinetics and large-scale biomolecular simulations.

Abstract

Hydrogen atom transfer (HAT) reactions are essential in many biological processes, such as radical migration in damaged proteins, but their mechanistic pathways remain incompletely understood. Simulating HAT is challenging due to the need for quantum chemical accuracy at biologically relevant scales; thus, neither classical force fields nor DFT-based molecular dynamics are applicable. Machine-learned potentials offer an alternative, able to learn potential energy surfaces (PESs) with near-quantum accuracy. However, training these models to generalize across diverse HAT configurations, especially at radical positions in proteins, requires tailored data generation and careful model selection. Here, we systematically generate HAT configurations in peptides to build large datasets using semiempirical methods and DFT. We benchmark three graph neural network architectures (SchNet, Allegro, and MACE) on their ability to learn HAT PESs and indirectly predict reaction barriers from energy predictions. MACE consistently outperforms the others in energy, force, and barrier prediction, achieving a mean absolute error of 1.13 kcal/mol on out-of-distribution DFT barrier predictions. Using molecular dynamics, we show our MACE potential is stable, reactive, and generalizes beyond training data to model HAT barriers in collagen I. This accuracy enables integration of ML potentials into large-scale collagen simulations to compute reaction rates from predicted barriers, advancing mechanistic understanding of HAT and radical migration in peptides. We analyze scaling laws, model transferability, and cost-performance trade-offs, and outline strategies for improvement by combining ML potentials with transition state search algorithms and active learning. Our approach is generalizable to other biomolecular systems, enabling quantum-accurate simulations of chemical reactivity in complex environments.

Learning Potential Energy Surfaces of Hydrogen Atom Transfer Reactions in Peptides

TL;DR

This work tackles the challenge of simulating hydrogen atom transfer (HAT) in peptides with quantum-accurate potential energy surfaces by building large peptide-focused datasets and benchmarking three graph neural networks (SchNet, Allegro, MACE). By learning full PESs and inferring reaction barriers indirectly from energy predictions, the authors show that MACE provides the best accuracy, achieving a barrier MAE of kcal/mol on out-of-distribution DFT data, and enabling stable MD simulations that capture HAT events. They validate the model on collagen 1 environments, demonstrating robust barrier shapes and local motif transferability, and discuss scaling, transferability, and strategies for improvement via active learning and TS searches. The study presents a generalizable pipeline for applying ML PESs to complex biomolecular reactivity, with potential for barrier-driven kinetics and large-scale biomolecular simulations.

Abstract

Hydrogen atom transfer (HAT) reactions are essential in many biological processes, such as radical migration in damaged proteins, but their mechanistic pathways remain incompletely understood. Simulating HAT is challenging due to the need for quantum chemical accuracy at biologically relevant scales; thus, neither classical force fields nor DFT-based molecular dynamics are applicable. Machine-learned potentials offer an alternative, able to learn potential energy surfaces (PESs) with near-quantum accuracy. However, training these models to generalize across diverse HAT configurations, especially at radical positions in proteins, requires tailored data generation and careful model selection. Here, we systematically generate HAT configurations in peptides to build large datasets using semiempirical methods and DFT. We benchmark three graph neural network architectures (SchNet, Allegro, and MACE) on their ability to learn HAT PESs and indirectly predict reaction barriers from energy predictions. MACE consistently outperforms the others in energy, force, and barrier prediction, achieving a mean absolute error of 1.13 kcal/mol on out-of-distribution DFT barrier predictions. Using molecular dynamics, we show our MACE potential is stable, reactive, and generalizes beyond training data to model HAT barriers in collagen I. This accuracy enables integration of ML potentials into large-scale collagen simulations to compute reaction rates from predicted barriers, advancing mechanistic understanding of HAT and radical migration in peptides. We analyze scaling laws, model transferability, and cost-performance trade-offs, and outline strategies for improvement by combining ML potentials with transition state search algorithms and active learning. Our approach is generalizable to other biomolecular systems, enabling quantum-accurate simulations of chemical reactivity in complex environments.

Paper Structure

This paper contains 22 sections, 2 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Workflow overview: We generated training data for HAT reactions in peptides and trained graph neural networks to learn the corresponding PES. We used direct energy predictions from these models to indirectly predict HAT reaction barriers.
  • Figure 2: Overview of the training data generation workflow (a). Starting from SMILES representations of amino acids and dipeptides, we generated 3D coordinates using RDKit. After optimizing to get the minimum energy structures, we generated conformers on which we applied normal mode sampling to obtain non-equilibrium structures (b). These configurations serve as input for generating inter- and intra-HAT radical systems (c). Reaction configurations are then sampled by randomly translating the hydrogen atom designated for transfer. Additional evaluation data for the barriers is generated by linear interpolation of the hydrogen atom (d).
  • Figure 3: a) Barrier height distributions from linear interpolation test datasets calculated at the xTB and DFT levels. The linear data was only used in test sets, not in training. xTB systematically underestimates barrier heights relative to DFT. b) Example interpolation from the test set for intermolecular HAT between capped Arginine–Glutamate and Lysine–Proline dipeptides (98 atoms). xTB barrier: $\Delta E_\mathrm{left} = 2.50$ eV, $\Delta E_\mathrm{right} = 2.85$ eV; DFT barrier: $\Delta E_\mathrm{left} = 3.25$ eV, $\Delta E_\mathrm{right} = 3.37$ eV.
  • Figure 4: Learning curves of GNNs: a) Test set force MAE vs. training dataset size. b) Test set barrier MAE vs. training dataset size.
  • Figure 5: MACE is transferrable to different system sizes: Force MAEs and per-atom energy MAEs vs. atom count. The per-atom energy MAE initially increases, then decreases with increasing system size.
  • ...and 7 more figures