Table of Contents
Fetching ...

MICCAI STS 2024 Challenge: Semi-Supervised Instance-Level Tooth Segmentation in Panoramic X-ray and CBCT Images

Yaqi Wang, Zhi Li, Chengyu Wu, Jun Liu, Yifan Zhang, Jiaxue Ni, Qian Luo, Jialuo Chen, Hongyuan Zhang, Jin Liu, Can Han, Kaiwen Fu, Changkai Ji, Xinxu Cai, Jing Hao, Zhihao Zheng, Shi Xu, Junqiang Chen, Qianni Zhang, Dahong Qian, Shuai Wang, Huiyu Zhou

TL;DR

The paper presents the STS 2024 Challenge, the first open benchmark for semi-supervised, instance-level tooth segmentation in multimodal dental imaging (OPG and CBCT). It shows that SSL frameworks, especially those combining foundation-model guidance (e.g., SAM) with multi-stage coarse-to-fine pipelines, achieve substantial improvements over fully supervised baselines, including large gains in Instance Dice and Instance Affinity. The study provides a comprehensive analysis of top-performing strategies, qualitative failure modes, and a human-in-the-loop annotation study demonstrating dramatic reductions in labeling time, thereby highlighting the practical value of label-efficient AI in dentistry. While SSL delivers strong gains, the work also exposes computational and data-diversity challenges, underlining the need for scalable, clinically integrated solutions and further multi-center validation. The publicly released dataset and code offer a foundation for future research toward robust, clinically deployable dental AI systems that can handle pediatric and complex 3D imaging scenarios.

Abstract

Orthopantomogram (OPGs) and Cone-Beam Computed Tomography (CBCT) are vital for dentistry, but creating large datasets for automated tooth segmentation is hindered by the labor-intensive process of manual instance-level annotation. This research aimed to benchmark and advance semi-supervised learning (SSL) as a solution for this data scarcity problem. We organized the 2nd Semi-supervised Teeth Segmentation (STS 2024) Challenge at MICCAI 2024. We provided a large-scale dataset comprising over 90,000 2D images and 3D axial slices, which includes 2,380 OPG images and 330 CBCT scans, all featuring detailed instance-level FDI annotations on part of the data. The challenge attracted 114 (OPG) and 106 (CBCT) registered teams. To ensure algorithmic excellence and full transparency, we rigorously evaluated the valid, open-source submissions from the top 10 (OPG) and top 5 (CBCT) teams, respectively. All successful submissions were deep learning-based SSL methods. The winning semi-supervised models demonstrated impressive performance gains over a fully-supervised nnU-Net baseline trained only on the labeled data. For the 2D OPG track, the top method improved the Instance Affinity (IA) score by over 44 percentage points. For the 3D CBCT track, the winning approach boosted the Instance Dice score by 61 percentage points. This challenge confirms the substantial benefit of SSL for complex, instance-level medical image segmentation tasks where labeled data is scarce. The most effective approaches consistently leveraged hybrid semi-supervised frameworks that combined knowledge from foundational models like SAM with multi-stage, coarse-to-fine refinement pipelines. Both the challenge dataset and the participants' submitted code have been made publicly available on GitHub (https://github.com/ricoleehduu/STS-Challenge-2024), ensuring transparency and reproducibility.

MICCAI STS 2024 Challenge: Semi-Supervised Instance-Level Tooth Segmentation in Panoramic X-ray and CBCT Images

TL;DR

The paper presents the STS 2024 Challenge, the first open benchmark for semi-supervised, instance-level tooth segmentation in multimodal dental imaging (OPG and CBCT). It shows that SSL frameworks, especially those combining foundation-model guidance (e.g., SAM) with multi-stage coarse-to-fine pipelines, achieve substantial improvements over fully supervised baselines, including large gains in Instance Dice and Instance Affinity. The study provides a comprehensive analysis of top-performing strategies, qualitative failure modes, and a human-in-the-loop annotation study demonstrating dramatic reductions in labeling time, thereby highlighting the practical value of label-efficient AI in dentistry. While SSL delivers strong gains, the work also exposes computational and data-diversity challenges, underlining the need for scalable, clinically integrated solutions and further multi-center validation. The publicly released dataset and code offer a foundation for future research toward robust, clinically deployable dental AI systems that can handle pediatric and complex 3D imaging scenarios.

Abstract

Orthopantomogram (OPGs) and Cone-Beam Computed Tomography (CBCT) are vital for dentistry, but creating large datasets for automated tooth segmentation is hindered by the labor-intensive process of manual instance-level annotation. This research aimed to benchmark and advance semi-supervised learning (SSL) as a solution for this data scarcity problem. We organized the 2nd Semi-supervised Teeth Segmentation (STS 2024) Challenge at MICCAI 2024. We provided a large-scale dataset comprising over 90,000 2D images and 3D axial slices, which includes 2,380 OPG images and 330 CBCT scans, all featuring detailed instance-level FDI annotations on part of the data. The challenge attracted 114 (OPG) and 106 (CBCT) registered teams. To ensure algorithmic excellence and full transparency, we rigorously evaluated the valid, open-source submissions from the top 10 (OPG) and top 5 (CBCT) teams, respectively. All successful submissions were deep learning-based SSL methods. The winning semi-supervised models demonstrated impressive performance gains over a fully-supervised nnU-Net baseline trained only on the labeled data. For the 2D OPG track, the top method improved the Instance Affinity (IA) score by over 44 percentage points. For the 3D CBCT track, the winning approach boosted the Instance Dice score by 61 percentage points. This challenge confirms the substantial benefit of SSL for complex, instance-level medical image segmentation tasks where labeled data is scarce. The most effective approaches consistently leveraged hybrid semi-supervised frameworks that combined knowledge from foundational models like SAM with multi-stage, coarse-to-fine refinement pipelines. Both the challenge dataset and the participants' submitted code have been made publicly available on GitHub (https://github.com/ricoleehduu/STS-Challenge-2024), ensuring transparency and reproducibility.

Paper Structure

This paper contains 39 sections, 5 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: Overview of the Semi-Supervised Learning (SSL) framework for dental instance segmentation and its clinical utility. The workflow proceeds from (Left) multi-modal data acquisition (OPG and CBCT) covering both pediatric and adult populations, to (Middle) the core semi-supervised training paradigm where an Encoder-Decoder network leverages expert annotations and iteratively refines performance via pseudo-labeling, and finally to (Right) diverse downstream clinical applications, such as orthodontic treatment planning, root canal therapy, and automated report generation, which rely on precise tooth instance masks.
  • Figure 2: End-to-end workflow of the MICCAI 2024 Semi-supervised Teeth Segmentation (STS) Challenge. The process encompasses five key stages: (1) multi-center data collection to ensure dataset diversity, (2) iterative annotation by clinicians for high-quality ground truth, (3) construction of the semi-supervised dataset with distinct labeled and unlabeled sets, and (4/5) the final summarization and evaluation of submitted participant methods.
  • Figure 3: Overview of the STS 2024 Challenge participation and timeline. (a) A world map illustrating the geographical distribution of registered participants. (b) A detailed timeline of the challenge schedule, from the training phase start to the final announcement of results.
  • Figure 4: Overview of prominent methodological strategies employed by participants in the STS 2024 Challenge. The figure illustrates four key approaches: (a) Knowledge transfer with pretrained models, where pre-trained foundation models (e.g., SAM) are leveraged to improve segmentation. (b) Consistency regularization learning, including self-teaching, model perturbation, and semantic consistency ($\mathcal{T}_{1}$ and $\mathcal{T}_{2}$ denote two kinds of transformation). (c) Multi-stage architecture optimization decomposes the problem into multiple sub-problems and gradually obtained fine results.
  • Figure 5: Architecture of the SemiT-SAM model, submitted by team 'Isjinhao' for the 2D challenge track. The model employs an encoder-decoder structure comprising an image encoder, a basic feature pyramid, a query initialization unit, and a mask decoder.
  • ...and 12 more figures