Table of Contents
Fetching ...

Training-free CryoET Tomogram Segmentation

Yizhou Zhao, Hengwei Bian, Michael Mu, Mostofa R. Uddin, Zhenyang Li, Xiang Li, Tianyang Wang, Min Xu

TL;DR

CryoSAM presents a training-free framework for full CryoET tomogram semantic segmentation driven by prompts. It bridges 2D foundation models to 3D segmentation via Cross-Plane Self-Prompting, which propagates masks across all planes from a single initial prompt, and a Hierarchical Feature Matching scheme that efficiently finds relevant particle features across a multi-resolution tomogram. By extracting multi-view 2D features and performing coarse-to-fine matching, CryoSAM generates high-quality 3D segmentation without supervised training, achieving strong particle-picking performance and enabling full tomogram segmentation for diverse subcellular structures. The method significantly reduces annotation effort and runtime, offering practical impact for structural biology analyses of CryoET data.

Abstract

Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D foundation models and present a novel, training-free framework, CryoSAM. In addition to prompt-based single-particle instance segmentation, our approach can automatically search for similar features, facilitating full tomogram semantic segmentation with only one prompt. CryoSAM is composed of two major parts: 1) a prompt-based 3D segmentation system that uses prompts to complete single-particle instance segmentation recursively with Cross-Plane Self-Prompting, and 2) a Hierarchical Feature Matching mechanism that efficiently matches relevant features with extracted tomogram features. They collaborate to enable the segmentation of all particles of one category with just one particle-specific prompt. Our experiments show that CryoSAM outperforms existing works by a significant margin and requires even fewer annotations in particle picking. Further visualizations demonstrate its ability when dealing with full tomogram segmentation for various subcellular structures. Our code is available at: https://github.com/xulabs/aitom

Training-free CryoET Tomogram Segmentation

TL;DR

CryoSAM presents a training-free framework for full CryoET tomogram semantic segmentation driven by prompts. It bridges 2D foundation models to 3D segmentation via Cross-Plane Self-Prompting, which propagates masks across all planes from a single initial prompt, and a Hierarchical Feature Matching scheme that efficiently finds relevant particle features across a multi-resolution tomogram. By extracting multi-view 2D features and performing coarse-to-fine matching, CryoSAM generates high-quality 3D segmentation without supervised training, achieving strong particle-picking performance and enabling full tomogram segmentation for diverse subcellular structures. The method significantly reduces annotation effort and runtime, offering practical impact for structural biology analyses of CryoET data.

Abstract

Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D foundation models and present a novel, training-free framework, CryoSAM. In addition to prompt-based single-particle instance segmentation, our approach can automatically search for similar features, facilitating full tomogram semantic segmentation with only one prompt. CryoSAM is composed of two major parts: 1) a prompt-based 3D segmentation system that uses prompts to complete single-particle instance segmentation recursively with Cross-Plane Self-Prompting, and 2) a Hierarchical Feature Matching mechanism that efficiently matches relevant features with extracted tomogram features. They collaborate to enable the segmentation of all particles of one category with just one particle-specific prompt. Our experiments show that CryoSAM outperforms existing works by a significant margin and requires even fewer annotations in particle picking. Further visualizations demonstrate its ability when dealing with full tomogram segmentation for various subcellular structures. Our code is available at: https://github.com/xulabs/aitom
Paper Structure (17 sections, 6 equations, 8 figures, 3 tables)

This paper contains 17 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Framework overview. ❶: We extract per-slice 2D features for three views (z, y, and x) from CryoET tomogram $\mathbf{I}$ and concatenate them as $\mathbf{F}$. ❷: After segmenting the particle(s) prompted by $\mathbf{P}$ with instance segmentation mask(s), ❸: we average pool the masked features to get query feature $\mathbf{F}_Q$. ❹: To efficiently propose prompts for further segmentation, we match $\mathbf{F}_Q$ with $\mathbf{F}$ using Hierarchical Feature Matching. ❺: Finally, we adopt prompt-based 3D segmentation for semantic segmentation results $\mathbf{M}$.
  • Figure 2: The pipeline of prompt-based 3D segmentation. After segmenting the orthogonal planes intersect at the point prompt $\mathbf{P}_i$, we iteratively execute Cross-Plane Self-Prompting until we get the complete mask of the particle.
  • Figure 3: The pipeline of Hierarchical Feature Matching. We average the tomogram features in the instance segmentation masks to obtain a query feature $\mathbf{F}_Q$. Then we downsample $\mathbf{F}$ into several coarse ones and match them with $\mathbf{F}_Q$ in a coarse-to-fine manner. After the last matching stage, we apply NMS and gather coordinates with top $K$ similarities as prompts to derive final semantic segmentation results.
  • Figure 4: Intermediate and final results of CryoSAM. In (d) and (f), we show points with coordinates ranging from $z-20$ to $z+20$ for demonstration.
  • Figure 5: Ablation study for the number of proposed prompts. 512/1024/All: number of proposed prompts selected for prompt-based semantic segmentation.
  • ...and 3 more figures