Table of Contents
Fetching ...

Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment

Omid Halimi Milani, Amanda Nikho, Marouane Tliba, Lauren Mills, Ahmet Enis Cetin, Mohammed H Elnagar

TL;DR

This work tackles automated SOS fusion staging from CBCT by training a teacher model on cropped ROIs and a student model on full images, using a spatial logits knowledge-distillation loss and a Grad-CAM–based attention loss to guide the student’s focus without external detectors at inference. The approach employs a temperature-scaled KL divergence with $T=3$ and combines a total loss $\mathcal{L}_{total}$ that fuses $\mathcal{L}_{attn}$, $\mathcal{L}_{dist}$, and $\mathcal{L}_{cls}$, enabling end-to-end, ROI-free classification. Experimental results show the proposed framework achieves $83.75\%$ accuracy on full images (Precision $84.01\%$, Recall $83.75\%$, F1 $83.29\%$), closely approaching the teacher’s cropped-image performance and outperforming baseline full-image models. The method reduces preprocessing overhead by removing the need for an external ROI detector during deployment, offering a scalable and interpretable solution for skeletal maturity assessment in orthodontics and forensic anthropology, with potential applicability to other medical imaging tasks requiring localized feature emphasis.

Abstract

We introduce a novel deep learning framework for the automated staging of spheno-occipital synchondrosis (SOS) fusion, a critical diagnostic marker in both orthodontics and forensic anthropology. Our approach leverages a dual-model architecture wherein a teacher model, trained on manually cropped images, transfers its precise spatial understanding to a student model that operates on full, uncropped images. This knowledge distillation is facilitated by a newly formulated loss function that aligns spatial logits as well as incorporates gradient-based attention spatial mapping, ensuring that the student model internalizes the anatomically relevant features without relying on external cropping or YOLO-based segmentation. By leveraging expert-curated data and feedback at each step, our framework attains robust diagnostic accuracy, culminating in a clinically viable end-to-end pipeline. This streamlined approach obviates the need for additional pre-processing tools and accelerates deployment, thereby enhancing both the efficiency and consistency of skeletal maturation assessment in diverse clinical settings.

Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment

TL;DR

This work tackles automated SOS fusion staging from CBCT by training a teacher model on cropped ROIs and a student model on full images, using a spatial logits knowledge-distillation loss and a Grad-CAM–based attention loss to guide the student’s focus without external detectors at inference. The approach employs a temperature-scaled KL divergence with and combines a total loss that fuses , , and , enabling end-to-end, ROI-free classification. Experimental results show the proposed framework achieves accuracy on full images (Precision , Recall , F1 ), closely approaching the teacher’s cropped-image performance and outperforming baseline full-image models. The method reduces preprocessing overhead by removing the need for an external ROI detector during deployment, offering a scalable and interpretable solution for skeletal maturity assessment in orthodontics and forensic anthropology, with potential applicability to other medical imaging tasks requiring localized feature emphasis.

Abstract

We introduce a novel deep learning framework for the automated staging of spheno-occipital synchondrosis (SOS) fusion, a critical diagnostic marker in both orthodontics and forensic anthropology. Our approach leverages a dual-model architecture wherein a teacher model, trained on manually cropped images, transfers its precise spatial understanding to a student model that operates on full, uncropped images. This knowledge distillation is facilitated by a newly formulated loss function that aligns spatial logits as well as incorporates gradient-based attention spatial mapping, ensuring that the student model internalizes the anatomically relevant features without relying on external cropping or YOLO-based segmentation. By leveraging expert-curated data and feedback at each step, our framework attains robust diagnostic accuracy, culminating in a clinically viable end-to-end pipeline. This streamlined approach obviates the need for additional pre-processing tools and accelerates deployment, thereby enhancing both the efficiency and consistency of skeletal maturation assessment in diverse clinical settings.

Paper Structure

This paper contains 13 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Skull oriented in three planes (a). Occipital and sphenoid bones cropped (b). SOS rotated and segmented (c).
  • Figure 2: Knowledge Distillation and YOLO-Based Automation with Gradient Loss for SOS Fusion Staging
  • Figure : (a) Without framework
  • Figure : (a) Without framework
  • Figure : (b) Proposed Framework