Table of Contents
Fetching ...

SOLVAR: Fast covariance-based heterogeneity analysis with pose refinement for cryo-EM

Roey Yadgar, Roy R. Lederman, Yoel Shkolnisky

TL;DR

SOLVAR tackles continuous structural heterogeneity in cryo-EM by reformulating covariance estimation as a fast, low-rank optimization over eigenvectors. It introduces a maximum-likelihood estimator with $\ ext{Sigma} = VV^{*}$ and integrates particle pose and contrast refinement into the heterogeneity analysis, enabling ab-initio workflows. The method achieves state-of-the-art or competitive results on CryoBench datasets, improves pose accuracy in realistic pipelines, and demonstrates applicability to large experimental datasets, while offering scalable runtime and flexibility in projection modeling. This work meaningfully accelerates covariance-based heterogeneity analysis and tightens integration with established cryo-EM software, opening avenue for more accurate, end-to-end ab-initio reconstructions.

Abstract

Cryo-electron microscopy (cryo-EM) has emerged as a powerful technique for resolving the three-dimensional structures of macromolecules. A key challenge in cryo-EM is characterizing continuous heterogeneity, where molecules adopt a continuum of conformational states. Covariance-based methods offer a principled approach to modeling structural variability. However, estimating the covariance matrix efficiently remains a challenging computational task. In this paper, we present SOLVAR (Stochastic Optimization for Low-rank Variability Analysis), which leverages a low-rank assumption on the covariance matrix to provide a tractable estimator for its principal components, despite the apparently prohibitive large size of the covariance matrix. Under this low-rank assumption, our estimator can be formulated as an optimization problem that can be solved quickly and accurately. Moreover, our framework enables refinement of the poses of the input particle images, a capability absent from most heterogeneity-analysis methods, and all covariance-based methods. Numerical experiments on both synthetic and experimental datasets demonstrate that the algorithm accurately captures dominant components of variability while maintaining computational efficiency. SOLVAR achieves state-of-the-art performance across multiple datasets in a recent heterogeneity benchmark. The code of the algorithm is freely available at https://github.com/RoeyYadgar/SOLVAR.

SOLVAR: Fast covariance-based heterogeneity analysis with pose refinement for cryo-EM

TL;DR

SOLVAR tackles continuous structural heterogeneity in cryo-EM by reformulating covariance estimation as a fast, low-rank optimization over eigenvectors. It introduces a maximum-likelihood estimator with and integrates particle pose and contrast refinement into the heterogeneity analysis, enabling ab-initio workflows. The method achieves state-of-the-art or competitive results on CryoBench datasets, improves pose accuracy in realistic pipelines, and demonstrates applicability to large experimental datasets, while offering scalable runtime and flexibility in projection modeling. This work meaningfully accelerates covariance-based heterogeneity analysis and tightens integration with established cryo-EM software, opening avenue for more accurate, end-to-end ab-initio reconstructions.

Abstract

Cryo-electron microscopy (cryo-EM) has emerged as a powerful technique for resolving the three-dimensional structures of macromolecules. A key challenge in cryo-EM is characterizing continuous heterogeneity, where molecules adopt a continuum of conformational states. Covariance-based methods offer a principled approach to modeling structural variability. However, estimating the covariance matrix efficiently remains a challenging computational task. In this paper, we present SOLVAR (Stochastic Optimization for Low-rank Variability Analysis), which leverages a low-rank assumption on the covariance matrix to provide a tractable estimator for its principal components, despite the apparently prohibitive large size of the covariance matrix. Under this low-rank assumption, our estimator can be formulated as an optimization problem that can be solved quickly and accurately. Moreover, our framework enables refinement of the poses of the input particle images, a capability absent from most heterogeneity-analysis methods, and all covariance-based methods. Numerical experiments on both synthetic and experimental datasets demonstrate that the algorithm accurately captures dominant components of variability while maintaining computational efficiency. SOLVAR achieves state-of-the-art performance across multiple datasets in a recent heterogeneity benchmark. The code of the algorithm is freely available at https://github.com/RoeyYadgar/SOLVAR.
Paper Structure (22 sections, 7 theorems, 47 equations, 10 figures, 3 tables)

This paper contains 22 sections, 7 theorems, 47 equations, 10 figures, 3 tables.

Key Result

Lemma B.1

Let $A,B \in \mathbb{C}^{n \times n}$ be two low-rank positive semi-definite (PSD) matrices, where $A = \sum_{i=1}^{r_1} a_i a_i^*$ and $B = \sum_{i=1}^{r_2} b_i b_i^*$. Then, their Frobenius inner product $\langle A, B \rangle_{F}$ (that is, their dot product when considered as vectors) satisfies $

Figures (10)

  • Figure 1: UMAP embedding of SOLVAR using 10 estimated principal components, for each dataset in Cryobench, colored by the published ground truth labels. See Cryobenchjeon2025cryobenchdiversechallengingdatasets for a detailed description of each dataset.
  • Figure 2: Comparison of reconstructed volumes using different methods for the synthetic IgG-1D dataset. \ref{['subfig:vol_comparison']} Reconstructed volume from CryoDRGN (using ground truth poses), RECOVAR (using the poses obtained by RELION's refinement), and SOLVAR (using the poses obtained by RELION's refinement and optimizing over the particles' poses), overlayed by the ground truth conformation (green). The red parts show the binary XOR between the reconstructed volume and the overlayed ground truth volume. \ref{['subfig:fsc_comparison']} FSC curve of the three reconstructed volumes with the ground truth conformation. CryoDRGN produces a noisier volume despite using ground-truth poses, while RECOVAR's is not fully aligned due to errors in the input poses. SOLVAR is able to correct the poses and output an aligned and clean volume. RECOVAR and SOLVAR were sharpened with relion_postprocess. Since CryoDRGN does not output half-maps, it was sharpened with a B-factor of -250 Å$^2$ (the average B-factor RELION uses for RECOVAR and SOLVAR).
  • Figure 3: Particles pose error obtained by SOLVAR on the synthetic IgG-1D dataset. Initial pose error is obtained from homogeneous refinement process. The lowpass cutoff frequency applied to the estimated principal components increases by a factor of 2 every 40 epochs. SOLVAR improves the out-of-plane, in-plane, offset, and contrast errors by $56\%$, $29\%$, $69\%$, and $70\%$, respectively.
  • Figure 4: SOLVAR results for EMPIAR-10076 using 15 estimated principal components. \ref{['subfig:empiar_vols']} Reconstructed volumes of each identified state, colored by published labels. Volumes are sharpened with relion_postprocess (except for state A) with an average B-factor of $-101$Å$^2$. \ref{['subfig:empiar_umap']} UMAP embedding colored by the published labels. Each class is annotated by the mean of the latent coordinates of this class $\frac{1}{|S_i|}\sum_{j\in S_i} \hat{z}_j$. The resulting volumes and UMAP embedding are consistent with those of other methods CryoDRGNZhong2021-njrecovar.
  • Figure 5: SOLVAR results for EMPIAR-10180 using 10 estimated principal components in an ab-initio setting. \ref{['subfig:empiar180_umap']} UMAP embedding of the resulting latent coordinates. \ref{['subfig:empiar180_vols']} Reconstructed volumes at the trajectory points shown in \ref{['subfig:empiar180_umap']}. The contour of the first volume (dark green) is placed on top of each volume as a reference. The volumes were sharpened with relion_postprocess by masking the core and foot of the spliceosome (with an average B-factor of $-102$Å$^2$) and keeping the helicase and SF3B parts unsharpened.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Lemma B.3
  • proof
  • Lemma B.4
  • proof
  • Lemma B.5
  • proof
  • ...and 3 more