Universally applicable and tunable graph-based coarse-graining for Machine learning force fields
Christoph Brunken, Sebastien Boyer, Mustafa Omar, Martin Maarand, Olivier Peltre, Solal Attias, Bakary N'tji Diallo, Anastasia Markina, Olaf Othersen, Oliver Bent
TL;DR
This paper tackles the challenge of creating a transferable coarse-grained ML force field (CG-MLFF) that generalizes across diverse biosystems (proteins, RNA, lipids). It introduces a MACE-based CG force field coupled with a tunable graph-based coarse-graining pipeline, trained on a fragmentation-derived dataset generated with semi-empirical references. A key contribution is the four-parameter tunable CG mapping, with coefficients $c_A$, $c_B$, $c_C$, and $c_D$, optimized by differential evolution to reduce force noise and improve training stability. While the tuned CG model often enhances training and qualitative MD behavior, MD stability is system-dependent; nonetheless, the results demonstrate the feasibility of a transferable CG-MLFF and outline a path toward including solvation and higher-accuracy references in future work. Overall, this work advances toward universally applicable CG force fields that can support large-scale biomolecular simulations.
Abstract
Coarse-grained (CG) force field methods for molecular systems are a crucial tool to simulate large biological macromolecules and are therefore essential for characterisations of biomolecular systems. While state-of-the-art deep learning (DL)-based models for all-atom force fields have improved immensely over recent years, we observe and analyse significant limitations of the currently available approaches for DL-based CG simulations. In this work, we present the first transferable DL-based CG force field approach (i.e., not specific to only one narrowly defined system type) applicable to a wide range of biosystems. To achieve this, our CG algorithm does not rely on hard-coded rules and is tuned to output coarse-grained systems optimised for minimal statistical noise in the ground truth CG forces, which results in significant improvement of model training. Our force field model is also the first CG variant that is based on the MACE architecture and is trained on a custom dataset created by a new approach based on the fragmentation of large biosystems covering protein, RNA and lipid chemistry. We demonstrate that our model can be applied in molecular dynamics simulations to obtain stable and qualitatively accurate trajectories for a variety of systems, while also discussing cases for which we observe limited reliability.
