Table of Contents
Fetching ...

HEroBM: a deep equivariant graph neural network for universal backmapping from coarse-grained to all-atom representations

Daniele Angioletti, Stefano Raniolo, Vittorio Limongelli

TL;DR

Coarse-grained (CG) simulations enable large-scale, long-timescale molecular modeling but lose atomistic detail, complicating the recovery of full atomistic structures. HEroBM introduces a universal, locality-driven backmapping framework based on SE(3)-equivariant graph neural networks and a hierarchical anchor scheme to reconstruct atomistic coordinates from any CG mapping. Across proteins, lipids, and small molecules, HEroBM achieves high reconstruction fidelity with substantially less training data than competing ML-based methods, and demonstrates scalability to large systems and real CG trajectories, including GPCRs in lipid bilayers with bound ligands. The framework supports end-to-end backmapping and integrates with energy minimisation and MD workflows, offering a practical, adaptable tool (potentially via a webserver) to enable accurate atomistic restoration from coarse-grained simulations.

Abstract

Molecular simulations have assumed a paramount role in the fields of chemistry, biology, and material sciences, being able to capture the intricate dynamic properties of systems. Within this realm, coarse-grained (CG) techniques have emerged as invaluable tools to sample large-scale systems and reach extended timescales by simplifying system representation. However, CG approaches come with a trade-off: they sacrifice atomistic details that might hold significant relevance in deciphering the investigated process. Therefore, a recommended approach is to identify key CG conformations and process them using backmapping methods, which retrieve atomistic coordinates. Currently, rule-based methods yield subpar geometries and rely on energy relaxation, resulting in less-than-optimal outcomes. Conversely, machine learning techniques offer higher accuracy but are either limited in transferability between systems or tied to specific CG mappings. In this work, we introduce HEroBM, a dynamic and scalable method that employs deep equivariant graph neural networks and a hierarchical approach to achieve high-resolution backmapping. HEroBM handles any type of CG mapping, offering a versatile and efficient protocol for reconstructing atomistic structures with high accuracy. Focused on local principles, HEroBM spans the entire chemical space and is transferable to systems of varying sizes. We illustrate the versatility of our framework through diverse biological systems, including a complex real-case scenario. Here, our end-to-end backmapping approach accurately generates the atomistic coordinates of a G protein-coupled receptor bound to an organic small molecule within a cholesterol/phospholipid bilayer.

HEroBM: a deep equivariant graph neural network for universal backmapping from coarse-grained to all-atom representations

TL;DR

Coarse-grained (CG) simulations enable large-scale, long-timescale molecular modeling but lose atomistic detail, complicating the recovery of full atomistic structures. HEroBM introduces a universal, locality-driven backmapping framework based on SE(3)-equivariant graph neural networks and a hierarchical anchor scheme to reconstruct atomistic coordinates from any CG mapping. Across proteins, lipids, and small molecules, HEroBM achieves high reconstruction fidelity with substantially less training data than competing ML-based methods, and demonstrates scalability to large systems and real CG trajectories, including GPCRs in lipid bilayers with bound ligands. The framework supports end-to-end backmapping and integrates with energy minimisation and MD workflows, offering a practical, adaptable tool (potentially via a webserver) to enable accurate atomistic restoration from coarse-grained simulations.

Abstract

Molecular simulations have assumed a paramount role in the fields of chemistry, biology, and material sciences, being able to capture the intricate dynamic properties of systems. Within this realm, coarse-grained (CG) techniques have emerged as invaluable tools to sample large-scale systems and reach extended timescales by simplifying system representation. However, CG approaches come with a trade-off: they sacrifice atomistic details that might hold significant relevance in deciphering the investigated process. Therefore, a recommended approach is to identify key CG conformations and process them using backmapping methods, which retrieve atomistic coordinates. Currently, rule-based methods yield subpar geometries and rely on energy relaxation, resulting in less-than-optimal outcomes. Conversely, machine learning techniques offer higher accuracy but are either limited in transferability between systems or tied to specific CG mappings. In this work, we introduce HEroBM, a dynamic and scalable method that employs deep equivariant graph neural networks and a hierarchical approach to achieve high-resolution backmapping. HEroBM handles any type of CG mapping, offering a versatile and efficient protocol for reconstructing atomistic structures with high accuracy. Focused on local principles, HEroBM spans the entire chemical space and is transferable to systems of varying sizes. We illustrate the versatility of our framework through diverse biological systems, including a complex real-case scenario. Here, our end-to-end backmapping approach accurately generates the atomistic coordinates of a G protein-coupled receptor bound to an organic small molecule within a cholesterol/phospholipid bilayer.
Paper Structure (29 sections, 11 equations, 17 figures, 3 tables)

This paper contains 29 sections, 11 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: HEroBM Framework Overview. Beginning with a coarse-grained PDB structure (a), we encode the beads as a graph and feed it as input into the Equivariant Graph Neural Network (b). The network's output comprises two critical elements: A set of 3-dimensional distance vectors for each bead and a $(\phi,\psi)$ pair for $C_{\alpha}$ beads (c). Next, atoms are reconstructed in a hierarchical manner (d), refining the structure from coarse-grained representations to atomistic detail. Subsequently, we execute an optimisation process, fine-tuning the backbones (e). This process yields the fully realized atomistic structure (f).
  • Figure 2: Hierarchical backmapping of Gluamate residue side chain. Colored arrows represent the distance vectors $\vec{V}_{hj}$ predicted by the HEroBM Equivariant Graph Neural Network. $C_{\delta}$ atom has hierarchy level 1 (the highest), thus uses the CG Bead position as anchor point and is positioned at distance $\vec{V}_{yellow}$ from it. Atoms at level 2 use the atom(s) in the lower level as anchor: $O_{E1}$, $O_{E2}$ and $C_{\gamma}$ are placed according to their predicted distance vectors (in orange), relative to $C_{\delta}$ position. Finally, $C_{\beta}$ is positioned relative to $C_{\gamma}$.
  • Figure 3: (a) Distribution of RMSD error pertaining to the backmapped structures of the PDB29k test set. To visually represent this distribution, we have overlaid a fitted Gaussian curve. (b) Visual comparison between the backmapped result produced by HEroBM with the highest error on the dataset (highlighted in tan) and the ground truth atomistic structure (depicted in cyan).
  • Figure 4: Distribution of torsion angles for both ground truth (depicted in orange) and reconstructed structures (shown in blue) within the PED test datasets. The Ramachandran plot provides a visual representation of the backmapped distribution, rendered in a gradient form gray to blue, while the ground truth distribution is presented through an orange contour plot. Each panel also includes a breakdown of the $\chi_{1}$ distribution in the top-right half and the $\chi_{2}$ distribution in the bottom-right half.
  • Figure 5: HEroBM results for MOMs featured during March and April, 2023.
  • ...and 12 more figures