Table of Contents
Fetching ...

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

Antonio Mirarchi, Raul P. Pelaez, Guillem Simeon, Gianni De Fabritiis

TL;DR

Advanced Machine-learning Atomic Representation Omni-force-field (AMARO) is introduced, a new neural network potential that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms.

Abstract

All-atom molecular simulations offer detailed insights into macromolecular phenomena, but their substantial computational cost hinders the exploration of complex biological processes. We introduce Advanced Machine-learning Atomic Representation Omni-force-field (AMARO), a new neural network potential (NNP) that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms. AMARO demonstrates the feasibility of training coarser NNP, without prior energy terms, to run stable protein dynamics with scalability and generalization capabilities.

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

TL;DR

Advanced Machine-learning Atomic Representation Omni-force-field (AMARO) is introduced, a new neural network potential that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms.

Abstract

All-atom molecular simulations offer detailed insights into macromolecular phenomena, but their substantial computational cost hinders the exploration of complex biological processes. We introduce Advanced Machine-learning Atomic Representation Omni-force-field (AMARO), a new neural network potential (NNP) that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms. AMARO demonstrates the feasibility of training coarser NNP, without prior energy terms, to run stable protein dynamics with scalability and generalization capabilities.
Paper Structure (16 sections, 2 equations, 7 figures)

This paper contains 16 sections, 2 equations, 7 figures.

Figures (7)

  • Figure 1: The pipeline for developing all-heavy atom NNPs is reported here. A CG map is applied to the mdCATH dataset mirarchi2024mdcath, and an embedding $z$ for each domain is created. TensorNetsimeon2023tensornet is then trained using the CG data. Generalization and scale-up properties are evaluated on a set of four fast-folding proteins and larger domains in the final stage.
  • Figure 2: Traning- and validation- MSE loss, in blue and orange respectively, for AMARO as a function of training epoch.
  • Figure 3: Scale-up validation of AMARO on larger domains from mdCATH. Comparison between labeled (i.e. reference) and predicted force component values (x, y, z). Each data point in the scatter plot is color-coded according to the CG atom type.
  • Figure 4: Comparative analysis of the free energy landscape obtained from all-atom simulations (left) and NNP coarse-grained simulations (right) across the first two TICA dimensions for four fast-folding proteins: Chignolin, Trp-cage, Villin and $\alpha$3D.
  • Figure 5: CG trajectories of Chignolin, Trp-Cage, Villin, and $\alpha$3D, selected based on the inclusion of microstates from the lowest RMSD macrostate. (a) Minimum RMSD conformation (blue) aligned with the experimental structure (grey) for each protein, labeled with the protein name and PDB ID. (b) C$\alpha$ RMSD of each trajectory compared to the crystal structure. (c) CG free energy surface, projected over the first two TICs with the folded state (red star) and sampled states indicated by RMSD color-coded dots. The trajectory’s progression is illustrated with arrows connecting the starting (yellow point) and ending (orange point) conformations. The all-atom equilibrium density is shown by a red contour.
  • ...and 2 more figures