Table of Contents
Fetching ...

cryoSPHERE: Single-particle heterogeneous reconstruction from cryo EM

Gabriel Ducrocq, Lukas Grunewald, Sebastian Westenhoff, Fredrik Lindsten

TL;DR

This work tackles the challenge of resolving conformational heterogeneity in single-particle cryo-EM by learning a segment-based deformation of a nominal structure $S_0$ (e.g., from AlphaFold) using a variational auto-encoder. Each image is mapped to a latent $z$ that decodes to per-segment rigid motions, yielding a deformed structure that is projected and matched to the observed image under a physically grounded image formation model. The approach leverages a Gaussian-mixture segmentation of the residue chain, enforcing end-to-end differentiability and enabling interpretable motions; it outperforms state-of-the-art volume- and structure-based methods on synthetic and real datasets, particularly at high noise levels. By coupling structural priors with learned segment motions, cryoSPHERE opens doors to recovering ensemble conformations and kinetic insights from cryo-EM data, while providing debiasing avenues when base structures introduce bias in noisy regimes.

Abstract

The three-dimensional structure of proteins plays a crucial role in determining their function. Protein structure prediction methods, like AlphaFold, offer rapid access to a protein structure. However, large protein complexes cannot be reliably predicted, and proteins are dynamic, making it important to resolve their full conformational distribution. Single-particle cryo-electron microscopy (cryo-EM) is a powerful tool for determining the structures of large protein complexes. Importantly, the numerous images of a given protein contain underutilized information about conformational heterogeneity. These images are very noisy projections of the protein, and traditional methods for cryo-EM reconstruction are limited to recovering only one or a few consensus conformations. In this paper, we introduce cryoSPHERE, which is a deep learning method that uses a nominal protein structure (e.g., from AlphaFold) as input, learns how to divide it into segments, and moves these segments as approximately rigid bodies to fit the different conformations present in the cryo-EM dataset. This approach provides enough constraints to enable meaningful reconstructions of single protein structural ensembles. We demonstrate this with two synthetic datasets featuring varying levels of noise, as well as two real dataset. We show that cryoSPHERE is very resilient to the high levels of noise typically encountered in experiments, where we see consistent improvements over the current state-of-the-art for heterogeneous reconstruction.

cryoSPHERE: Single-particle heterogeneous reconstruction from cryo EM

TL;DR

This work tackles the challenge of resolving conformational heterogeneity in single-particle cryo-EM by learning a segment-based deformation of a nominal structure (e.g., from AlphaFold) using a variational auto-encoder. Each image is mapped to a latent that decodes to per-segment rigid motions, yielding a deformed structure that is projected and matched to the observed image under a physically grounded image formation model. The approach leverages a Gaussian-mixture segmentation of the residue chain, enforcing end-to-end differentiability and enabling interpretable motions; it outperforms state-of-the-art volume- and structure-based methods on synthetic and real datasets, particularly at high noise levels. By coupling structural priors with learned segment motions, cryoSPHERE opens doors to recovering ensemble conformations and kinetic insights from cryo-EM data, while providing debiasing avenues when base structures introduce bias in noisy regimes.

Abstract

The three-dimensional structure of proteins plays a crucial role in determining their function. Protein structure prediction methods, like AlphaFold, offer rapid access to a protein structure. However, large protein complexes cannot be reliably predicted, and proteins are dynamic, making it important to resolve their full conformational distribution. Single-particle cryo-electron microscopy (cryo-EM) is a powerful tool for determining the structures of large protein complexes. Importantly, the numerous images of a given protein contain underutilized information about conformational heterogeneity. These images are very noisy projections of the protein, and traditional methods for cryo-EM reconstruction are limited to recovering only one or a few consensus conformations. In this paper, we introduce cryoSPHERE, which is a deep learning method that uses a nominal protein structure (e.g., from AlphaFold) as input, learns how to divide it into segments, and moves these segments as approximately rigid bodies to fit the different conformations present in the cryo-EM dataset. This approach provides enough constraints to enable meaningful reconstructions of single protein structural ensembles. We demonstrate this with two synthetic datasets featuring varying levels of noise, as well as two real dataset. We show that cryoSPHERE is very resilient to the high levels of noise typically encountered in experiments, where we see consistent improvements over the current state-of-the-art for heterogeneous reconstruction.
Paper Structure (37 sections, 12 equations, 56 figures, 1 table)

This paper contains 37 sections, 12 equations, 56 figures, 1 table.

Figures (56)

  • Figure 1: Flow chart of our network. The learnable parts of the model are the encoder, the decoder and the Gaussian mixture. Note that even though the transformations predicted by the decoder are on a per image basis, that is not the case of the Gaussian mixture, which is shared across all particles.
  • Figure 2: Example of segments recovered with a Gaussian mixture of 6 components.
  • Figure 3: MD dataset SNR 0.001. Left: Histograms of the distances of the two upper domains. The true distances are in green. The recovered distances are in blue. Right: Predicted against true distances in Ångström. The black line represent $x=y$.The correlation between the predicted and true distances is 0.73. For the same plot for cryoStar, see Appendix \ref{['append:MD']} of the supplementary file.
  • Figure 4: MD dataset. Left: cryoSPHERE Recovered segments. The colors denotes different contiguous domains. Middle and right: mean FSC comparison +/- one standard deviation, for cryoSphere and cryoDRGN and cryoStar. For a comparison between cryoStar and cryoDRGN, see Appendix \ref{['append:MD']} in the supplementary file.
  • Figure 5: EMPIAR10180. Left and middle left: different views of the structures corresponding to the red dots of Figure \ref{['fig:empiar10180_latent_space']}. The motion goes from red (left in the first principal component) to white to blue (right of the principal component). Only the $C_\alpha$ atoms are shown. Right and middle right: different views of two volumes recovered by training DRGN-AI on the latent space of cryoSPHERE. The U2 domain disappears on the volume because of a compositional heterogeneity.
  • ...and 51 more figures