Table of Contents
Fetching ...

Sparse Anatomical Prompt Semi-Supervised Learning with Masked Image Modeling for CBCT Tooth Segmentation

Pengyu Dai, Yafei Ou, Yuqiao Yang, Yang Liu, Yue Zhao

TL;DR

This work tackles CBCT tooth segmentation under limited labeled data by combining a graph-attention sparse boundary prompt with a boundary-informed masked autoencoder pre-training strategy. The approach injects anatomical boundary priors into a MAE framework and then fine-tunes on scant labeled data, using a Dice–BCE loss for segmentation. Experiments on 158 clinical CBCT scans show competitive Dice scores, with DSC around 89.8% on full data and robust performance under reduced labeling, outperforming several baselines by about 2%. The method reduces dependence on large labeled datasets and improves boundary delineation in challenging dental regions, indicating potential for more efficient, near-clinical deployment.

Abstract

Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT dental images being closely positioned, coupled with subtle inter-class differences, gives rise to the challenge of indistinct boundaries when training model with limited data. To address these challenges, this study aims to propose a tasked-oriented Masked Auto-Encoder paradigm to effectively utilize large amounts of unlabeled data to achieve accurate tooth segmentation with limited labeled data. Specifically, we first construct a self-supervised pre-training framework of masked auto encoder to efficiently utilize unlabeled data to enhance the network performance. Subsequently, we introduce a sparse masked prompt mechanism based on graph attention to incorporate boundary information of the teeth, aiding the network in learning the anatomical structural features of teeth. To the best of our knowledge, we are pioneering the integration of the mask pre-training paradigm into the CBCT tooth segmentation task. Extensive experiments demonstrate both the feasibility of our proposed method and the potential of the boundary prompt mechanism.

Sparse Anatomical Prompt Semi-Supervised Learning with Masked Image Modeling for CBCT Tooth Segmentation

TL;DR

This work tackles CBCT tooth segmentation under limited labeled data by combining a graph-attention sparse boundary prompt with a boundary-informed masked autoencoder pre-training strategy. The approach injects anatomical boundary priors into a MAE framework and then fine-tunes on scant labeled data, using a Dice–BCE loss for segmentation. Experiments on 158 clinical CBCT scans show competitive Dice scores, with DSC around 89.8% on full data and robust performance under reduced labeling, outperforming several baselines by about 2%. The method reduces dependence on large labeled datasets and improves boundary delineation in challenging dental regions, indicating potential for more efficient, near-clinical deployment.

Abstract

Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT dental images being closely positioned, coupled with subtle inter-class differences, gives rise to the challenge of indistinct boundaries when training model with limited data. To address these challenges, this study aims to propose a tasked-oriented Masked Auto-Encoder paradigm to effectively utilize large amounts of unlabeled data to achieve accurate tooth segmentation with limited labeled data. Specifically, we first construct a self-supervised pre-training framework of masked auto encoder to efficiently utilize unlabeled data to enhance the network performance. Subsequently, we introduce a sparse masked prompt mechanism based on graph attention to incorporate boundary information of the teeth, aiding the network in learning the anatomical structural features of teeth. To the best of our knowledge, we are pioneering the integration of the mask pre-training paradigm into the CBCT tooth segmentation task. Extensive experiments demonstrate both the feasibility of our proposed method and the potential of the boundary prompt mechanism.
Paper Structure (12 sections, 6 equations, 2 figures, 1 table)

This paper contains 12 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The architecture of the proposed method. The proposed method includes three stages. First, we fix the parameters of a Graph-based boundary prompt branch, which is trained on sparse boundary annotations. Second, we self-supervised pre-train a U-Net-based masked autoencoder in an unlabeled manner. Particularly, the fixed boundary prompt branch will provide effective sparse boundary representations to the masked parts of the network in the process. Finally, we apply the pre-trained network to the downstream task of CBCT tooth segmentation.
  • Figure 2: Comparing results of the 2D and 3D visualizations. The results demonstrate that in the case of incisors in close proximity and wisdom teeth, our method effectively avoids boundary blurred and missed segmentation.