cryoSPHERE: Single-particle heterogeneous reconstruction from cryo EM
Gabriel Ducrocq, Lukas Grunewald, Sebastian Westenhoff, Fredrik Lindsten
TL;DR
This work tackles the challenge of resolving conformational heterogeneity in single-particle cryo-EM by learning a segment-based deformation of a nominal structure $S_0$ (e.g., from AlphaFold) using a variational auto-encoder. Each image is mapped to a latent $z$ that decodes to per-segment rigid motions, yielding a deformed structure that is projected and matched to the observed image under a physically grounded image formation model. The approach leverages a Gaussian-mixture segmentation of the residue chain, enforcing end-to-end differentiability and enabling interpretable motions; it outperforms state-of-the-art volume- and structure-based methods on synthetic and real datasets, particularly at high noise levels. By coupling structural priors with learned segment motions, cryoSPHERE opens doors to recovering ensemble conformations and kinetic insights from cryo-EM data, while providing debiasing avenues when base structures introduce bias in noisy regimes.
Abstract
The three-dimensional structure of proteins plays a crucial role in determining their function. Protein structure prediction methods, like AlphaFold, offer rapid access to a protein structure. However, large protein complexes cannot be reliably predicted, and proteins are dynamic, making it important to resolve their full conformational distribution. Single-particle cryo-electron microscopy (cryo-EM) is a powerful tool for determining the structures of large protein complexes. Importantly, the numerous images of a given protein contain underutilized information about conformational heterogeneity. These images are very noisy projections of the protein, and traditional methods for cryo-EM reconstruction are limited to recovering only one or a few consensus conformations. In this paper, we introduce cryoSPHERE, which is a deep learning method that uses a nominal protein structure (e.g., from AlphaFold) as input, learns how to divide it into segments, and moves these segments as approximately rigid bodies to fit the different conformations present in the cryo-EM dataset. This approach provides enough constraints to enable meaningful reconstructions of single protein structural ensembles. We demonstrate this with two synthetic datasets featuring varying levels of noise, as well as two real dataset. We show that cryoSPHERE is very resilient to the high levels of noise typically encountered in experiments, where we see consistent improvements over the current state-of-the-art for heterogeneous reconstruction.
