Table of Contents
Fetching ...

Mixture of neural fields for heterogeneous reconstruction in cryo-EM

Axel Levy, Rishwanth Raghu, David Shustin, Adele Rui-Yang Peng, Huan Li, Oliver Biggs Clarke, Gordon Wetzstein, Ellen D. Zhong

TL;DR

Hydra addresses the challenge of reconstructing cryo-EM samples with mixed compositional and conformational heterogeneity by representing densities as arising from one of $K$ neural fields in a mixture model. It jointly optimizes pose, conformation, and class assignments using a hierarchical pose search and autodecoding-based gating, enabling fully ab initio reconstruction. On synthetic and experimental datasets, Hydra achieves superior composition separation and 3D reconstructions compared to prior neural methods, and can reveal multiple protein complexes from unpurified mixtures in a single pass. This approach broadens the applicability of cryo-EM to complex, in situ samples and suggests avenues for extending neural-field methods to subtomogram averaging and other heterogeneous imaging tasks.

Abstract

Cryo-electron microscopy (cryo-EM) is an experimental technique for protein structure determination that images an ensemble of macromolecules in near-physiological contexts. While recent advances enable the reconstruction of dynamic conformations of a single biomolecular complex, current methods do not adequately model samples with mixed conformational and compositional heterogeneity. In particular, datasets containing mixtures of multiple proteins require the joint inference of structure, pose, compositional class, and conformational states for 3D reconstruction. Here, we present Hydra, an approach that models both conformational and compositional heterogeneity fully ab initio by parameterizing structures as arising from one of K neural fields. We employ a new likelihood-based loss function and demonstrate the effectiveness of our approach on synthetic datasets composed of mixtures of proteins with large degrees of conformational variability. We additionally demonstrate Hydra on an experimental dataset of a cellular lysate containing a mixture of different protein complexes. Hydra expands the expressivity of heterogeneous reconstruction methods and thus broadens the scope of cryo-EM to increasingly complex samples.

Mixture of neural fields for heterogeneous reconstruction in cryo-EM

TL;DR

Hydra addresses the challenge of reconstructing cryo-EM samples with mixed compositional and conformational heterogeneity by representing densities as arising from one of neural fields in a mixture model. It jointly optimizes pose, conformation, and class assignments using a hierarchical pose search and autodecoding-based gating, enabling fully ab initio reconstruction. On synthetic and experimental datasets, Hydra achieves superior composition separation and 3D reconstructions compared to prior neural methods, and can reveal multiple protein complexes from unpurified mixtures in a single pass. This approach broadens the applicability of cryo-EM to complex, in situ samples and suggests avenues for extending neural-field methods to subtomogram averaging and other heterogeneous imaging tasks.

Abstract

Cryo-electron microscopy (cryo-EM) is an experimental technique for protein structure determination that images an ensemble of macromolecules in near-physiological contexts. While recent advances enable the reconstruction of dynamic conformations of a single biomolecular complex, current methods do not adequately model samples with mixed conformational and compositional heterogeneity. In particular, datasets containing mixtures of multiple proteins require the joint inference of structure, pose, compositional class, and conformational states for 3D reconstruction. Here, we present Hydra, an approach that models both conformational and compositional heterogeneity fully ab initio by parameterizing structures as arising from one of K neural fields. We employ a new likelihood-based loss function and demonstrate the effectiveness of our approach on synthetic datasets composed of mixtures of proteins with large degrees of conformational variability. We additionally demonstrate Hydra on an experimental dataset of a cellular lysate containing a mixture of different protein complexes. Hydra expands the expressivity of heterogeneous reconstruction methods and thus broadens the scope of cryo-EM to increasingly complex samples.

Paper Structure

This paper contains 31 sections, 9 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The Hydra method for ab initio heterogeneous cryo-EM reconstruction. (a) Schematic representation of the space of energetically plausible density maps in a heterogeneous cryo-EM dataset. We approximate this space with a finite union of low-dimensional manifolds. The compositional states (or classes) are labeled by $k$. The "conformation" within class $k$ refers to intrinsic coordinates within the $k$-th manifold. (b) Optimization pipeline. The conformations, poses, class probabilities and neural fields are optimized such as to maximize the likelihood of the observed images ("picked particles") under the model described in Section \ref{['sec:lv-model']}.
  • Figure 2: Results on the compositionally heterogeneous tomotwin3 dataset.(a-c) Reconstructed densities and estimated conformations with $K\in\{1,3,5\}$. We report the number of particles in each class between parenthesis. We represent density maps using isosurfaces. (a) With $K=1$ (DRGN-AI), the model fails to reconstruct the three density maps, in spite of using $d=8$ dimensions to represent conformations. (b) With $K=3$ ($d=2$), Hydra recovers the three density maps with perfect classification accuracy. (c) With $K=5$ ($d=2$), the model is over-parameterized and 2 classes out of 5 end up empty at the end of optimization. (d) Ground truth density maps for the tomotwin3 dataset.
  • Figure 3: Results on an experimental dataset containing a mixture of membrane and solutble protein complexes.(a) Density maps obtained with Hydra ($K=4$) on the Ryanodine receptor dataset. (b) Confusion matrix between Hydra and cryoSPARC $K=6$ heterogeneous refinement (three classes representing RyR were combined for analysis). (c) Fourier shell correlation (FSC) between the Hydra density maps and refined cryoSPARC density maps. (d)Left: latent space plot and right: representative density maps from each of the latent space clusters from DRGN-AI.
  • Figure 4: Results on the compositionally and conformationally heterogeneous ribosplike dataset. Particles within each latent space are colored by class. Representative density maps are generated from the latent points denoted in white dots.
  • Figure S1: 25 representative sample images from each of the referenced three datasets.(a) Sample images for the tomotwin3 synthetic dataset, $D=128$, 4.5 Å/pix. (b) Sample images for the experimental ryanodine receptor dataset, $D=150$, 3.32 Å/pix. (c) Sample images for the ribosplike synthetic dataset, $D=128$, 4.24 Å/pix.
  • ...and 8 more figures