Table of Contents
Fetching ...

Leveraging Point Transformers for Detecting Anatomical Landmarks in Digital Dentistry

Tibor Kubík, Oldřich Kodym, Petr Šilling, Kateřina Trávníčková, Tomáš Mojžiš, Jan Matula

TL;DR

This study tackles automatic detection of anatomical landmarks in 3D dental scans, a challenging problem due to limited data and anatomical variability. It introduces a Point Transformer v3–based geometry encoder, a distance decoder predicting six per-point distance maps, and a topology-driven non-minima suppression (CTD-NMS) to robustly extract landmarks from dense meshes without predefined landmark counts. The approach achieves around 0.64 precision and recall at 0–2 mm thresholds and demonstrates robustness gains when using sharpened distance maps, with a compact 8.9M-parameter model and ~1.13 s inference per scan, highlighting potential for real-time clinical use. The work also provides data augmentation, geodesic-distance labeling, and interpretable feature analyses, contributing a practical framework for unconstrained 3D dental landmarking and facilitating future research in digital dentistry.

Abstract

The increasing availability of intraoral scanning devices has heightened their importance in modern clinical orthodontics. Clinicians utilize advanced Computer-Aided Design techniques to create patient-specific treatment plans that include laboriously identifying crucial landmarks such as cusps, mesial-distal locations, facial axis points, and tooth-gingiva boundaries. Detecting such landmarks automatically presents challenges, including limited dataset sizes, significant anatomical variability among subjects, and the geometric nature of the data. We present our experiments from the 3DTeethLand Grand Challenge at MICCAI 2024. Our method leverages recent advancements in point cloud learning through transformer architectures. We designed a Point Transformer v3 inspired module to capture meaningful geometric and anatomical features, which are processed by a lightweight decoder to predict per-point distances, further processed by graph-based non-minima suppression. We report promising results and discuss insights on learned feature interpretability.

Leveraging Point Transformers for Detecting Anatomical Landmarks in Digital Dentistry

TL;DR

This study tackles automatic detection of anatomical landmarks in 3D dental scans, a challenging problem due to limited data and anatomical variability. It introduces a Point Transformer v3–based geometry encoder, a distance decoder predicting six per-point distance maps, and a topology-driven non-minima suppression (CTD-NMS) to robustly extract landmarks from dense meshes without predefined landmark counts. The approach achieves around 0.64 precision and recall at 0–2 mm thresholds and demonstrates robustness gains when using sharpened distance maps, with a compact 8.9M-parameter model and ~1.13 s inference per scan, highlighting potential for real-time clinical use. The work also provides data augmentation, geodesic-distance labeling, and interpretable feature analyses, contributing a practical framework for unconstrained 3D dental landmarking and facilitating future research in digital dentistry.

Abstract

The increasing availability of intraoral scanning devices has heightened their importance in modern clinical orthodontics. Clinicians utilize advanced Computer-Aided Design techniques to create patient-specific treatment plans that include laboriously identifying crucial landmarks such as cusps, mesial-distal locations, facial axis points, and tooth-gingiva boundaries. Detecting such landmarks automatically presents challenges, including limited dataset sizes, significant anatomical variability among subjects, and the geometric nature of the data. We present our experiments from the 3DTeethLand Grand Challenge at MICCAI 2024. Our method leverages recent advancements in point cloud learning through transformer architectures. We designed a Point Transformer v3 inspired module to capture meaningful geometric and anatomical features, which are processed by a lightweight decoder to predict per-point distances, further processed by graph-based non-minima suppression. We report promising results and discuss insights on learned feature interpretability.

Paper Structure

This paper contains 19 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Outline of the proposed method. The pipeline consists of vertex stratification followed by a geometric encoder-decoder architecture that generates dense distance maps (visualized as heat maps), and CTD-NMS (Calibrated Topology-Driven Non-Maximum Suppression) for final landmark positioning.
  • Figure 2: Sample outputs on validation cases. Estimated distance maps, along with post-processed detections, are shown for a specific landmark class for each case. Top row: cases with satisfying results. Bottom row: failure cases.
  • Figure 3: Visualization of learned feature embeddings from the PTv3 encoder, projected onto RGB color space using PCA on subsampled vertices from two different validation cases.