Table of Contents
Fetching ...

cryoSENSE: Compressive Sensing Enables High-throughput Microscopy with Sparse and Generative Priors on the Protein Cryo-EM Image Manifold

Zain Shabeeb, Daniel Saeedi, Darin Tsui, Vida Jamali, Amirali Aghazadeh

Abstract

Cryo-electron microscopy (cryo-EM) enables the atomic-resolution visualization of biomolecules; however, modern direct detectors generate data volumes that far exceed the available storage and transfer bandwidth, thereby constraining practical throughput. We introduce cryoSENSE, the computational realization of a hardware-software co-designed framework for compressive cryo-EM sensing and acquisition. We show that cryo-EM images of proteins lie on low-dimensional manifolds that can be independently represented using sparse priors in predefined bases and generative priors captured by a denoising diffusion model. cryoSENSE leverages these low-dimensional manifolds to enable faithful image reconstruction from spatial and Fourier-domain undersampled measurements while preserving downstream structural resolution. In experiments, cryoSENSE increases acquisition throughput by up to 2.5$\times$ while retaining the original 3D resolution, offering controllable trade-offs between the number of masked measurements and the level of downsampling. Sparse priors favor faithful reconstruction from Fourier-domain measurements and moderate compression, whereas generative diffusion priors achieve accurate recovery from pixel-domain measurements and more severe undersampling. Project website: https://cryosense.github.io.

cryoSENSE: Compressive Sensing Enables High-throughput Microscopy with Sparse and Generative Priors on the Protein Cryo-EM Image Manifold

Abstract

Cryo-electron microscopy (cryo-EM) enables the atomic-resolution visualization of biomolecules; however, modern direct detectors generate data volumes that far exceed the available storage and transfer bandwidth, thereby constraining practical throughput. We introduce cryoSENSE, the computational realization of a hardware-software co-designed framework for compressive cryo-EM sensing and acquisition. We show that cryo-EM images of proteins lie on low-dimensional manifolds that can be independently represented using sparse priors in predefined bases and generative priors captured by a denoising diffusion model. cryoSENSE leverages these low-dimensional manifolds to enable faithful image reconstruction from spatial and Fourier-domain undersampled measurements while preserving downstream structural resolution. In experiments, cryoSENSE increases acquisition throughput by up to 2.5 while retaining the original 3D resolution, offering controllable trade-offs between the number of masked measurements and the level of downsampling. Sparse priors favor faithful reconstruction from Fourier-domain measurements and moderate compression, whereas generative diffusion priors achieve accurate recovery from pixel-domain measurements and more severe undersampling. Project website: https://cryosense.github.io.

Paper Structure

This paper contains 29 sections, 21 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: cryoSENSE increases data throughput by preserving 3D structural detail under high compression factors.a, Original 3D cryo-EM volume from uncompressed particle images. b, cryoSENSE 3D reconstructions from compressively acquired images, using generative (top row) and sparse priors (bottom row). Volumes are color-coded by their FSC resolution (lower, better).
  • Figure 2: Overview of cryoSENSE.a, cryoSENSE employs pixel-space and Fourier-space masking strategies to obtain compressed measurements. b, Sparsity priors enable image recovery via proximal gradient descent. c, Generative priors learn a low-dimensional manifold of cryo-EM images and guide a diffusion process to generate images consistent with the measurements. d, cryoSENSE enables high-throughput acquisition of cryo-EM images, which are then validated through downstream biological tasks such as e, 3D volume reconstruction, atomic model building, and conformational heterogeneity analysis.
  • Figure 3: a, Example EMPIAR-10076 particle images used for CryoDRGN heterogeneity analysis. b, UMAP projection of CryoDRGN latent space trained on original 128$\times$128 images, showing four distinct conformational clusters colored by GMM labels. c, Example cryoSENSE - DDPM reconstructions obtained from $K = 16$ downsampled images with $C=1.25$. d, UMAP projection of CryoDRGN latent space trained on DDPM reconstructions, colored by original GMM cluster labels from b. e, Example cryoSENSE - DCT reconstructions obtained from $K = 16$ downsampled images with $C=1.25$. f, UMAP projection of CryoDRGN latent space trained on DCT reconstructions, colored by original GMM cluster labels from b.
  • Figure 4: a, Atomic models from ModelAngelo fit into reconstructed cryo-EM 3D volumes for the Original dataset (EMPIAR-10648) and cryoSENSE reconstructions ($K=2$, $C=1.33$) using DCT and DDPM priors. Chains are colored by ModelAngelo prediction confidence scores. b, FSC curves for the three corresponding cryo-EM 3D volumes. c, Chain-level sequence alignment score vs. sequence identity for matched chains between Original vs. DDPM and Original vs. DCT; marker size reflects backbone RMSD. d, cryoSENSE - DDPM atomic model with four representative chain regions highlighted, corresponding to the same colored regions shown in g. e, First example structural comparison of the highlighted region in the Original model with its corresponding DDPM- and DCT-reconstructed regions. f, Second example structural comparison of the highlighted region in the Original model with its corresponding DDPM- and DCT-reconstructed regions. g, Original atomic model with four chain regions highlighted corresponding to the same regions as in d and h. h, cryoSENSE - DCT atomic model with four representative chain regions highlighted, corresponding to the same colored regions shown in g.
  • Figure S-5: Visualization of forward operator $\mathcal{A}$ with pixel-space masking over a single mask. The process begins with a $128 \times 128$ input image $\mathbf{x}_0$ and a random binary mask $B_0$. Non-overlapping kernel-wise convolution is applied over $K \times K$ patches, resulting in progressively downsampled measurement resolutions as kernel size increases.
  • ...and 12 more figures