Leveraging Point Transformers for Detecting Anatomical Landmarks in Digital Dentistry
Tibor Kubík, Oldřich Kodym, Petr Šilling, Kateřina Trávníčková, Tomáš Mojžiš, Jan Matula
TL;DR
This study tackles automatic detection of anatomical landmarks in 3D dental scans, a challenging problem due to limited data and anatomical variability. It introduces a Point Transformer v3–based geometry encoder, a distance decoder predicting six per-point distance maps, and a topology-driven non-minima suppression (CTD-NMS) to robustly extract landmarks from dense meshes without predefined landmark counts. The approach achieves around 0.64 precision and recall at 0–2 mm thresholds and demonstrates robustness gains when using sharpened distance maps, with a compact 8.9M-parameter model and ~1.13 s inference per scan, highlighting potential for real-time clinical use. The work also provides data augmentation, geodesic-distance labeling, and interpretable feature analyses, contributing a practical framework for unconstrained 3D dental landmarking and facilitating future research in digital dentistry.
Abstract
The increasing availability of intraoral scanning devices has heightened their importance in modern clinical orthodontics. Clinicians utilize advanced Computer-Aided Design techniques to create patient-specific treatment plans that include laboriously identifying crucial landmarks such as cusps, mesial-distal locations, facial axis points, and tooth-gingiva boundaries. Detecting such landmarks automatically presents challenges, including limited dataset sizes, significant anatomical variability among subjects, and the geometric nature of the data. We present our experiments from the 3DTeethLand Grand Challenge at MICCAI 2024. Our method leverages recent advancements in point cloud learning through transformer architectures. We designed a Point Transformer v3 inspired module to capture meaningful geometric and anatomical features, which are processed by a lightweight decoder to predict per-point distances, further processed by graph-based non-minima suppression. We report promising results and discuss insights on learned feature interpretability.
