CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

Chentianye Xu; Xueying Zhan; Min Xu

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

Chentianye Xu, Xueying Zhan, Min Xu

TL;DR

CryoMAE tackles the limited-label, low-SNR challenge of cryo-EM particle picking by introducing a two-stage few-shot method that leverages Masked Autoencoders and a novel self-cross similarity loss. Stage 1 learns discriminative particle features from a small set of exemplars and unlabeled regions, with a PU-learning-inspired weighting to handle potential particles in unlabeled data; Stage 2 applies the trained encoder to query micrographs, locating particles via latent-feature cosine similarity to exemplars and a density-based threshold. The approach yields strong improvements over state-of-the-art NN-based methods on CryoPPP, achieving up to $22.4\%$ improvement in 3D reconstruction resolution (average $11.1\%$) while requiring only about 15 exemplars per protein type, significantly reducing labeling burdens. Overall, CryoMAE advances practical cryo-EM analysis by enabling accurate, data-efficient particle picking and more reliable downstream reconstructions.

Abstract

Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-based approaches often require extensive labeled datasets, limiting their practicality. To overcome these obstacles, we introduce cryoMAE, a novel approach based on few-shot learning that harnesses the capabilities of Masked Autoencoders (MAE) to enable efficient selection of single particles in cryo-EM images. Contrary to conventional NN-based techniques, cryoMAE requires only a minimal set of positive particle images for training yet demonstrates high performance in particle detection. Furthermore, the implementation of a self-cross similarity loss ensures distinct features for particle and background regions, thereby enhancing the discrimination capability of cryoMAE. Experiments on large-scale cryo-EM datasets show that cryoMAE outperforms existing state-of-the-art (SOTA) methods, improving 3D reconstruction resolution by up to 22.4%.

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

TL;DR

improvement in 3D reconstruction resolution (average

) while requiring only about 15 exemplars per protein type, significantly reducing labeling burdens. Overall, CryoMAE advances practical cryo-EM analysis by enabling accurate, data-efficient particle picking and more reliable downstream reconstructions.

Abstract

Paper Structure (30 sections, 6 equations, 7 figures, 7 tables)

This paper contains 30 sections, 6 equations, 7 figures, 7 tables.

Introduction
Related Work
Particle Picking.
Masked Autoencoders.
Contrastive Learning.
Methodology
Overview
Problem setup.
Stage 1: Training on One Reference Micrograph
Model training.
Self-cross similarity.
PU learning.
Stage 2: Particle Picking on Query Micrographs
Experiments
Experimental Setup
...and 15 more sections

Figures (7)

Figure 1: In cryo-EM with SPA, electron beams capture numerous 2D images of proteins within a cryogenically preserved sample. These images are subsequently denoised and subjected to particle picking, facilitating the reconstruction of the 3D structure of the protein.
Figure 2: Overview of the two-stage cryoMAE framework: stage 1 illustrates the training phase with a mix of labeled particle and unlabeled regions, employing reconstruction loss and self-cross similarity loss. Stage 2 depicts the particle picking process, where the trained MAE encoder assesses query micrographs, leveraging latent feature comparisons to identify particle positions accurately.
Figure 3: Self-cross similarity loss.
Figure 4: 3D reconstructions for EMPIAR-10081 and EMPIAR-10093 using crYOLO, Topaz, and cryoMAE: (a)-(c) for 10081, (d)-(f) for 10093.
Figure 5: Similarity maps generated from query micrographs by cryoMAE, w/ and w/o adjusted self-cross similarity loss. (a),(d) original micrographs; (b),(e) similarity map w/o adjusted self-cross similarity loss; (c),(f) w/ adjusted self-cross similarity loss.
...and 2 more figures

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

TL;DR

Abstract

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (7)