Table of Contents
Fetching ...

Mapping Still Matters: Coarse-Graining with Machine Learning Potentials

Franz Görlich, Julija Zavadlav

TL;DR

This work investigates how coarse-graining mappings influence representations learned by equivariant machine learning potentials, using liquid hexane, capped amino acids, and a polyalanine chain. It compares classical CG potentials with ML potentials (MACE), revealing that mapping choices can induce artifacts such as bond-permutation, enantiomer symmetry, and chiral inversion pathways, which limit transferability. The study demonstrates that while ML potentials can learn the potential of mean force for many mappings, preserving topology and avoiding overlapping length scales are crucial for reliable predictions. The findings provide practical guidelines for selecting CG mappings compatible with modern architectures and highlight the need for topology-aware encoding or priors to achieve transferable, physically meaningful CG models with ML methods.

Abstract

Coarse-grained (CG) modeling enables molecular simulations to reach time and length scales inaccessible to fully atomistic methods. For classical CG models, the choice of mapping, that is, how atoms are grouped into CG sites, is a major determinant of accuracy and transferability. At the same time, the emergence of machine learning potentials (MLPs) offers new opportunities to build CG models that can in principle learn the true potential of the mean force for any mapping. In this work, we systematically investigate how the choice of mapping influences the representations learned by equivariant MLPs by studying liquid hexane, amino acids, and polyalanine. We find that when the length scales of bonded and nonbonded interactions overlap, unphysical bond permutations can occur. We also demonstrate that correctly encoding species and maintaining stereochemistry are crucial, as neglecting either introduces unphysical symmetries. Our findings provide practical guidance for selecting CG mappings compatible with modern architectures and guide the development of transferable CG models.

Mapping Still Matters: Coarse-Graining with Machine Learning Potentials

TL;DR

This work investigates how coarse-graining mappings influence representations learned by equivariant machine learning potentials, using liquid hexane, capped amino acids, and a polyalanine chain. It compares classical CG potentials with ML potentials (MACE), revealing that mapping choices can induce artifacts such as bond-permutation, enantiomer symmetry, and chiral inversion pathways, which limit transferability. The study demonstrates that while ML potentials can learn the potential of mean force for many mappings, preserving topology and avoiding overlapping length scales are crucial for reliable predictions. The findings provide practical guidelines for selecting CG mappings compatible with modern architectures and highlight the need for topology-aware encoding or priors to achieve transferable, physically meaningful CG models with ML methods.

Abstract

Coarse-grained (CG) modeling enables molecular simulations to reach time and length scales inaccessible to fully atomistic methods. For classical CG models, the choice of mapping, that is, how atoms are grouped into CG sites, is a major determinant of accuracy and transferability. At the same time, the emergence of machine learning potentials (MLPs) offers new opportunities to build CG models that can in principle learn the true potential of the mean force for any mapping. In this work, we systematically investigate how the choice of mapping influences the representations learned by equivariant MLPs by studying liquid hexane, amino acids, and polyalanine. We find that when the length scales of bonded and nonbonded interactions overlap, unphysical bond permutations can occur. We also demonstrate that correctly encoding species and maintaining stereochemistry are crucial, as neglecting either introduces unphysical symmetries. Our findings provide practical guidance for selecting CG mappings compatible with modern architectures and guide the development of transferable CG models.

Paper Structure

This paper contains 29 sections, 10 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Structural properties of the reference, classical, MLP models for different coarse-grained representations of hexane. First row shows bonded population density metrics: Dihedral (four-site model), Angle (three-site model), Bond distance (two-site model). In case of the two-site MLP simulation, we show the nearest neighbor distance instead of bond lengths based on the initial bond list. The second row shows the RDF of A-A beads. Results show the mean $\pm$ 3 standard deviations of 10 × 1000 ps simulations.
  • Figure 2: (a) Angular distribution of the hexane three-site model and (b) RDF of the two-site model for different correlation orders $\nu$ and number of message-passing layers $L$. In the two-site liquid hexane model, the length scales of bonded and nonbonded interactions overlap. Results show the mean $\pm$ 3 standard deviations of 10 × 1000 ps simulations.
  • Figure 3: Results of the capped alanine CG simulations with different mappings. The top and bottom rows present the high- and low-resolution mappings, respectively. Each mapping includes a Ramachandran plot derived from 100 × 5 ns simulations.
  • Figure 4: (a) Determined free energy barrier for enantiomerization in atomistic and coarse grained MLPs using well-tempered metadynamics along the improper $C_\alpha$-dihedral. (b) Mechanism of chiral inversion with high-energy planar transition state.
  • Figure 5: (a) A chain of D-Alanine forms left-handed helices, while the naturally occurring L-Alanine forms right-handed helices. (b) Helicity and handedness of a 500 ns reference simulation of a capped 15-mer L-alanine.
  • ...and 1 more figures