Table of Contents
Fetching ...

U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT

Zhi Qin Tan, Xiatian Zhu, Owen Addison, Yunpeng Li

TL;DR

To address the challenge of fast, accurate multi-anatomy CBCT segmentation in dentistry, the authors introduce U-Mamba2, a CNN-SSD hybrid that integrates the Mamba2 state-space framework into a U-Net backbone. The method adds an interactive cross-attention branch, self-supervised pretraining on unlabeled CBCT data, and dental-domain priors such as label smoothing, weighted loss for tiny structures, left-right mirroring, and anatomically informed post-processing. On ToothFairy3, U-Mamba2 sets new state-of-the-art mean Dice scores for both Task 1 and Task 2 and demonstrates strong efficiency, with ablations confirming the value of each domain-knowledge component. Overall, the work delivers a scalable, human-in-the-loop segmentation approach with practical implications for clinical diagnosis and surgical planning in dentistry.

Abstract

Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging. In this paper, we present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance. In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT. Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing first place in both tasks of the Toothfairy3 challenge. In Task 1, U-Mamba2 achieved a mean Dice of 0.84, HD95 of 38.17 with the held-out test data, with an average inference time of 40.58s. In Task 2, U-Mamba2 achieved the mean Dice of 0.87 and HD95 of 2.15 with the held-out test data. The code is publicly available at https://github.com/zhiqin1998/UMamba2.

U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT

TL;DR

To address the challenge of fast, accurate multi-anatomy CBCT segmentation in dentistry, the authors introduce U-Mamba2, a CNN-SSD hybrid that integrates the Mamba2 state-space framework into a U-Net backbone. The method adds an interactive cross-attention branch, self-supervised pretraining on unlabeled CBCT data, and dental-domain priors such as label smoothing, weighted loss for tiny structures, left-right mirroring, and anatomically informed post-processing. On ToothFairy3, U-Mamba2 sets new state-of-the-art mean Dice scores for both Task 1 and Task 2 and demonstrates strong efficiency, with ablations confirming the value of each domain-knowledge component. Overall, the work delivers a scalable, human-in-the-loop segmentation approach with practical implications for clinical diagnosis and surgical planning in dentistry.

Abstract

Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging. In this paper, we present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance. In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT. Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing first place in both tasks of the Toothfairy3 challenge. In Task 1, U-Mamba2 achieved a mean Dice of 0.84, HD95 of 38.17 with the held-out test data, with an average inference time of 40.58s. In Task 2, U-Mamba2 achieved the mean Dice of 0.87 and HD95 of 2.15 with the held-out test data. The code is publicly available at https://github.com/zhiqin1998/UMamba2.

Paper Structure

This paper contains 16 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: (Left): Overall architecture of the U-Mamba2 model. U-Mamba2 employs the encoder-decoder framework with residual connections between each stage and the U-Mamba2 block in the bottleneck. The number of stages is configurable depending on the dataset input size. (Right): The U-Mamba2 block contains the SSD-based Mamba2 and an optional click position encoder and cross-attention blocks. The output of Mamba2 follows the solid line for tasks without interactive clicks, while it follows the dashed line when clicks are present.
  • Figure 2: Qualitative results of U-Mamba2 on the validation set of Task 1. The 3D render and a representative 2D slice are shown for: (Top) the best scoring case and (Bottom) the worst scoring case.
  • Figure 3: (Left): Effect of the tile size on the metrics with '0,1' mirror axes in TTA. (Right): Effect of various mirror axes combinations in TTA on the metrics when tile size is set to 0.9. Axis definition: '0' is superior/inferior, '1' is anterior/posterior, and '2' is left/right.