Table of Contents
Fetching ...

Transformer-Based Tooth Alignment Prediction With Occlusion And Collision Constraints

ZhenXing Dong, JiaZhou Chen, YangHui Xu

TL;DR

This work re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously, and designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws.

Abstract

The planning of digital orthodontic treatment requires providing tooth alignment, which not only consumes a lot of time and labor to determine manually but also relays clinical experiences heavily. In this work, we proposed a lightweight tooth alignment neural network based on Swin-transformer. We first re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously. We then designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws. They are important clinical constraints, first introduced to the best of our knowledge, and lead to cutting-edge prediction accuracy. To train our network, we collected a large digital orthodontic dataset that has 591 clinical cases, including various complex clinical cases. This dataset will benefit the community after its release since there is no open dataset so far. Furthermore, we also proposed two new orthodontic dataset augmentation methods considering tooth spatial distribution and occlusion. We evaluated our method with this dataset and extensive experiments, including comparisons with STAT methods and ablation studies, and demonstrate the high prediction accuracy of our method.

Transformer-Based Tooth Alignment Prediction With Occlusion And Collision Constraints

TL;DR

This work re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously, and designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws.

Abstract

The planning of digital orthodontic treatment requires providing tooth alignment, which not only consumes a lot of time and labor to determine manually but also relays clinical experiences heavily. In this work, we proposed a lightweight tooth alignment neural network based on Swin-transformer. We first re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously. We then designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws. They are important clinical constraints, first introduced to the best of our knowledge, and lead to cutting-edge prediction accuracy. To train our network, we collected a large digital orthodontic dataset that has 591 clinical cases, including various complex clinical cases. This dataset will benefit the community after its release since there is no open dataset so far. Furthermore, we also proposed two new orthodontic dataset augmentation methods considering tooth spatial distribution and occlusion. We evaluated our method with this dataset and extensive experiments, including comparisons with STAT methods and ablation studies, and demonstrate the high prediction accuracy of our method.

Paper Structure

This paper contains 32 sections, 16 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: Our neural network architecture. The feature encoding module is divided into two branches: one encodes the global features from the tooth center, and the other encodes the local features from the tooth point cloud. The extraction of global features from the tooth center employs the SWTBS module, which consists of shared Swin-T blocks. Local features from the tooth point cloud are extracted by the SWTP module, which utilizes the multi-stage hierarchical feature fusion architecture mentioned in liu2021swin. The hidden vectors output by the two branches are merged and then passed through SWTBS feature propagation. Finally, an MLP is used to regress the 6 degrees of freedom (6DOF) transformation parameters required for orthodontics.
  • Figure 2: SWTBS module, consisting of 4 groups of shared Swin-T blocks, each group containing 16 channels. The residual of each feature transmission is added to the final output.
  • Figure 3: SWTP module, which adopts the multi-stage feature fusion mechanism from liu2021swin, differs in that it only merges the second dimension of the latent vector and does not merge the first dimension. The first dimension represents teeth, and for orthodontic prediction, features of multiple teeth can interact but should not be fused.
  • Figure 4: The organization form of a point cloud data in the dataset is Tooth Point Image. The first dimension represents 32 teeth, the second dimension represents 512 sampled points of a single tooth, and the third dimension represents the xyz coordinates of each sampled point. Therefore, the specification of a data point is [32, 512, 3].
  • Figure 5: The distribution of points selected by the first window for each tooth after different sorting methods is visualized. The red areas indicate the locations of the first thirty points. It can be observed that the red points are most reasonably distributed when sorted based on the distance along the simulated dental arch line. In this case, all red regions are in the same local area of the tooth, thus the points selected by each window can represent the relative positions of all teeth to a certain extent.
  • ...and 11 more figures