AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

Antonio Mirarchi; Raul P. Pelaez; Guillem Simeon; Gianni De Fabritiis

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

Antonio Mirarchi, Raul P. Pelaez, Guillem Simeon, Gianni De Fabritiis

TL;DR

Advanced Machine-learning Atomic Representation Omni-force-field (AMARO) is introduced, a new neural network potential that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms.

Abstract

All-atom molecular simulations offer detailed insights into macromolecular phenomena, but their substantial computational cost hinders the exploration of complex biological processes. We introduce Advanced Machine-learning Atomic Representation Omni-force-field (AMARO), a new neural network potential (NNP) that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms. AMARO demonstrates the feasibility of training coarser NNP, without prior energy terms, to run stable protein dynamics with scalability and generalization capabilities.

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 7 figures)

This paper contains 16 sections, 2 equations, 7 figures.

Introduction
Material and methods
Neural Network Model
No hydrogen CG map
Neural network training
Dataset
AMARO Molecular Simulations
Markov State Models
Results
Generalization to larger domains
Validation on fast-folding proteins
Recovering the energetic landscape
Sampling the native structures of unseen training proteins
Computational efficiency
Conclusions
...and 1 more sections

Figures (7)

Figure 1: The pipeline for developing all-heavy atom NNPs is reported here. A CG map is applied to the mdCATH dataset mirarchi2024mdcath, and an embedding $z$ for each domain is created. TensorNetsimeon2023tensornet is then trained using the CG data. Generalization and scale-up properties are evaluated on a set of four fast-folding proteins and larger domains in the final stage.
Figure 2: Traning- and validation- MSE loss, in blue and orange respectively, for AMARO as a function of training epoch.
Figure 3: Scale-up validation of AMARO on larger domains from mdCATH. Comparison between labeled (i.e. reference) and predicted force component values (x, y, z). Each data point in the scatter plot is color-coded according to the CG atom type.
Figure 4: Comparative analysis of the free energy landscape obtained from all-atom simulations (left) and NNP coarse-grained simulations (right) across the first two TICA dimensions for four fast-folding proteins: Chignolin, Trp-cage, Villin and $\alpha$3D.
Figure 5: CG trajectories of Chignolin, Trp-Cage, Villin, and $\alpha$3D, selected based on the inclusion of microstates from the lowest RMSD macrostate. (a) Minimum RMSD conformation (blue) aligned with the experimental structure (grey) for each protein, labeled with the protein name and PDB ID. (b) C$\alpha$ RMSD of each trajectory compared to the crystal structure. (c) CG free energy surface, projected over the first two TICs with the folded state (red star) and sampled states indicated by RMSD color-coded dots. The trajectory’s progression is illustrated with arrows connecting the starting (yellow point) and ending (orange point) conformations. The all-atom equilibrium density is shown by a red contour.
...and 2 more figures

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

TL;DR

Abstract

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

Authors

TL;DR

Abstract

Table of Contents

Figures (7)