Table of Contents
Fetching ...

CryoGEM: Physics-Informed Generative Cryo-Electron Microscopy

Jiakai Zhang, Qihe Chen, Yan Zeng, Wenyuan Gao, Xuming He, Zhijie Liu, Jingyi Yu

Abstract

In the past decade, deep conditional generative models have revolutionized the generation of realistic images, extending their application from entertainment to scientific domains. Single-particle cryo-electron microscopy (cryo-EM) is crucial in resolving near-atomic resolution 3D structures of proteins, such as the SARS- COV-2 spike protein. To achieve high-resolution reconstruction, a comprehensive data processing pipeline has been adopted. However, its performance is still limited as it lacks high-quality annotated datasets for training. To address this, we introduce physics-informed generative cryo-electron microscopy (CryoGEM), which for the first time integrates physics-based cryo-EM simulation with a generative unpaired noise translation to generate physically correct synthetic cryo-EM datasets with realistic noises. Initially, CryoGEM simulates the cryo-EM imaging process based on a virtual specimen. To generate realistic noises, we leverage an unpaired noise translation via contrastive learning with a novel mask-guided sampling scheme. Extensive experiments show that CryoGEM is capable of generating authentic cryo-EM images. The generated dataset can used as training data for particle picking and pose estimation models, eventually improving the reconstruction resolution.

CryoGEM: Physics-Informed Generative Cryo-Electron Microscopy

Abstract

In the past decade, deep conditional generative models have revolutionized the generation of realistic images, extending their application from entertainment to scientific domains. Single-particle cryo-electron microscopy (cryo-EM) is crucial in resolving near-atomic resolution 3D structures of proteins, such as the SARS- COV-2 spike protein. To achieve high-resolution reconstruction, a comprehensive data processing pipeline has been adopted. However, its performance is still limited as it lacks high-quality annotated datasets for training. To address this, we introduce physics-informed generative cryo-electron microscopy (CryoGEM), which for the first time integrates physics-based cryo-EM simulation with a generative unpaired noise translation to generate physically correct synthetic cryo-EM datasets with realistic noises. Initially, CryoGEM simulates the cryo-EM imaging process based on a virtual specimen. To generate realistic noises, we leverage an unpaired noise translation via contrastive learning with a novel mask-guided sampling scheme. Extensive experiments show that CryoGEM is capable of generating authentic cryo-EM images. The generated dataset can used as training data for particle picking and pose estimation models, eventually improving the reconstruction resolution.
Paper Structure (40 sections, 14 equations, 14 figures, 3 tables)

This paper contains 40 sections, 14 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: CryoGEM improves cryo-EM data analysis. Cryo-EM captures images of molecules in vitrified ice via electron beams. Data is processed for a high-resolution 3D reconstruction by a comprehensive pipeline. However, some modules like (a) particle picking and (d) ab-initio 3D reconstruction still lack high-quality training datasets. Given a coarse result as an input, CryoGEM can synthesize authentic single-particle micrographs as training dataset augmentation.
  • Figure 2: Pipeline of CryoGEM. We begin by creating a virtual specimen containing various initial reconstruction results. We then simulate the imaging process of cryo-EM, incorporating physical priors such as ice gradient and point spread function (PSF) to generate a physical simulation. By adding simple Gaussian noise to the physically simulated results, we introduce randomness within a contrastive learning framework. To enhance training efficiency and performance, we use the particle-background mask as a guide for patch sampling. The sampled positive and negative instances are then encoded into multi-scale features for contrastive learning. Additionally, we introduce an adversarial loss to ensure realistic cryo-EM image synthesis.
  • Figure 3: Visualization of the learned similarity. Given query patches (red for particle and blue for background) on the input real micrograph, we visualize the learned similarity maps of our and CUT's encoders $G_{enc}$ by calculating exp($G_{enc}(v)\cdot G_{enc}(v^{-}) / \tau$), where $v$ denotes the query and $v^{-}$ denotes the patches of real micrograph. The results imply that our encoder can recognize particles and backgrounds in real cryo-EM micrographs. However, CUT fails to learn that without the guidance of particle-background maps during training.
  • Figure 4: Qualitative comparison results. Our approach achieves the most authentic noise generation across all datasets. The traditional noise models succeed in preserving the structural information while lacking realistic noise patterns. CycleGAN, CUT, and CycleDiffusion introduce severe artifacts on generated results.
  • Figure 5: Qualitative comparison results of particle picking. The blue circles indicate matches with manual picking results, while the red circles represent misses or excess picks by the model.
  • ...and 9 more figures