Table of Contents
Fetching ...

An Iterative Framework for Generative Backmapping of Coarse Grained Proteins

Georgios Kementzidis, Erin Wong, John Nicholson, Ruichen Xu, Yuefan Deng

TL;DR

This work tackles the challenge of reconstructing fine-grained (FG) protein structures from ultra-coarse-grained (UCG) representations by introducing an iterative backmapping framework built on conditional variational autoencoders and graph neural networks. It formalizes backmapping as a chain of conditional distributions and derives a k-step evidence lower bound (ELBO) to enable separate optimization at each resolution, enabling a practical divide-and-conquer approach. A two-step scheme that pairs CGVAE (for Cα traces) with GenZProt (for FG reconstruction) demonstrates substantial improvements over a 1-step baseline across metrics such as RMSD, Graph Edit Distance, steric clashes, and Ramachandran-consistent secondary structure for two proteins with different structural characteristics, notably eIF4E and PED00151. The method offers memory-efficient, modular training and scalable accuracy gains, highlighting its potential for generating physically plausible FG conformations from ultra-coarse representations in biomolecular simulations, while also outlining future extensions to deeper multi-step schemes and IDP handling.

Abstract

The techniques of data-driven backmapping from coarse-grained (CG) to fine-grained (FG) representation often struggle with accuracy, unstable training, and physical realism, especially when applied to complex systems such as proteins. In this work, we introduce a novel iterative framework by using conditional Variational Autoencoders and graph-based neural networks, specifically designed to tackle the challenges associated with such large-scale biomolecules. Our method enables stepwise refinement from CG beads to full atomistic details. We outline the theory of iterative generative backmapping and demonstrate via numerical experiments the advantages of multistep schemes by applying them to proteins of vastly different structures with very coarse representations. This multistep approach not only improves the accuracy of reconstructions but also makes the training process more computationally efficient for proteins with ultra-CG representations.

An Iterative Framework for Generative Backmapping of Coarse Grained Proteins

TL;DR

This work tackles the challenge of reconstructing fine-grained (FG) protein structures from ultra-coarse-grained (UCG) representations by introducing an iterative backmapping framework built on conditional variational autoencoders and graph neural networks. It formalizes backmapping as a chain of conditional distributions and derives a k-step evidence lower bound (ELBO) to enable separate optimization at each resolution, enabling a practical divide-and-conquer approach. A two-step scheme that pairs CGVAE (for Cα traces) with GenZProt (for FG reconstruction) demonstrates substantial improvements over a 1-step baseline across metrics such as RMSD, Graph Edit Distance, steric clashes, and Ramachandran-consistent secondary structure for two proteins with different structural characteristics, notably eIF4E and PED00151. The method offers memory-efficient, modular training and scalable accuracy gains, highlighting its potential for generating physically plausible FG conformations from ultra-coarse representations in biomolecular simulations, while also outlining future extensions to deeper multi-step schemes and IDP handling.

Abstract

The techniques of data-driven backmapping from coarse-grained (CG) to fine-grained (FG) representation often struggle with accuracy, unstable training, and physical realism, especially when applied to complex systems such as proteins. In this work, we introduce a novel iterative framework by using conditional Variational Autoencoders and graph-based neural networks, specifically designed to tackle the challenges associated with such large-scale biomolecules. Our method enables stepwise refinement from CG beads to full atomistic details. We outline the theory of iterative generative backmapping and demonstrate via numerical experiments the advantages of multistep schemes by applying them to proteins of vastly different structures with very coarse representations. This multistep approach not only improves the accuracy of reconstructions but also makes the training process more computationally efficient for proteins with ultra-CG representations.

Paper Structure

This paper contains 22 sections, 9 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: An illustration of the multistep generative backmapping scheme. Starting from a UCG structure, we restore atomistic resolution using independent priors (blue networks) and decoders (red networks), one resolution-step at a time.
  • Figure 2: An FG conformation $\mathbf{x}_0$ with $k$ progressively coarser and less informative CG conformations $\mathbf{x}_i$. The average CG bead size $\rho$ is increasing.
  • Figure 3: The results for eIF4E (left) and PED00151 (right). The vertical axis in the bottom two rows is shown on a logarithmic scale.
  • Figure 4: (a) Starting from $\mathbf{x}_2$ with $n_2=8$, we restore the FG representation $\mathbf{x}_0$ of PED00151 using 1-step and 2-step schemes. (b) The ground truth $\mathbf{x}_0$.
  • Figure 5: Ramachandran plots for different schemes and CG bead sizes: (a) eIF4E, (b) PED00151. The contours correspond to the true distribution while the color corresponds to the probability density of the $(\phi, \psi)$ angle combinations across the backmapped structures.
  • ...and 3 more figures