Table of Contents
Fetching ...

DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

Yingjun Shen, Haizhao Dai, Qihe Chen, Yan Zeng, Jiakai Zhang, Yuan Pei, Jingyi Yu

TL;DR

This work introduces DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach, and demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines.

Abstract

Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach. By processing cryo-EM movies into odd and even images and treating them as independent noisy observations, we apply a denoising-reconstruction hybrid training scheme. We mask both images to create denoising and reconstruction tasks. For DRACO's pre-training, the quality of the dataset is essential, we hence build a high-quality, diverse dataset from an uncurated public database, including over 270,000 movies or micrographs. After pre-training, DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation model for various cryo-EM downstream tasks. DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines.

DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

TL;DR

This work introduces DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach, and demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines.

Abstract

Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach. By processing cryo-EM movies into odd and even images and treating them as independent noisy observations, we apply a denoising-reconstruction hybrid training scheme. We mask both images to create denoising and reconstruction tasks. For DRACO's pre-training, the quality of the dataset is essential, we hence build a high-quality, diverse dataset from an uncurated public database, including over 270,000 movies or micrographs. After pre-training, DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation model for various cryo-EM downstream tasks. DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines.

Paper Structure

This paper contains 22 sections, 11 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of DRACO. For pre-training, we construct a large-scale curated dataset containing 529 types of protein data with over 270,000 cryo-EM movies or micrographs. Based on this, we present DRACO, a denoising-reconstruction autoencoder for cryo-EM. A pre-trained DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation for various downstream model adaptions such as micrograph curation and particle picking.
  • Figure 2: The pipeline of DRACO. Given a pair of partially masked odd and even micrographs, the encoder takes odd-visible patches and even-visible patches as inputs. The unmasked latent patches are combined with masked latent patches together to generate the latent representation $\mathbf{z}_i$. Then the latent representation passes through the decoder to generate predicted patches. The N2N loss is applied to odd-visible predicted patches with corresponding even input patches, and vice versa. The reconstruction loss is applied to both invisible predicted patches with higher SNR input patches.
  • Figure 3: Visualization of particle picking results. We show the picking results of DRACO and baselines on the test datasets range from small transport proteins to huge ribosomes. Blue, red, and yellow circles denote true positives, false positives, and false negatives, respectively.
  • Figure 4: Qualitative comparison results of micrograph denoising. We visualize the denoising results of DRACO and state-of-the-art baselines. Our results show the most significant SNR improvement without the loss of the particle structure details. In contrast, Low-pass leads to a severe blur on particles, MAE introduces severe patch-wise artifacts and Topaz only shows either minor SNR improvements or blurred results.
  • Figure 5: Additional denoising results. We have conducted additional experiments on datasets of membrane proteins and bacteriophages. DRACO achieves the highest visual denoising quality by optimally balancing signal preservation and noise reduction.
  • ...and 3 more figures