Table of Contents
Fetching ...

3D Structure-guided Network for Tooth Alignment in 2D Photograph

Yulong Dou, Lanzhuju Mei, Dinggang Shen, Zhiming Cui

TL;DR

This work tackles generating orthodontic comparison photographs from 2D images by leveraging 3D intra-oral models to learn clinically-grounded tooth alignment. It introduces a three-module pipeline (Segm-Mod, Align-Mod, Gen-Mod) that segments teeth, renders 3D-to-2D guidance, and uses diffusion models to transform tooth contours and synthesize realistic-mouth images with consistent texture and lighting. The approach demonstrates superior performance over GAN baselines, validated through ablations and a user study, and shows practical potential for enhancing dentist-patient communication. By incorporating 3D structural knowledge without requiring persistent 3D input at inference, the method offers a clinically relevant, user-friendly solution for visualizing post-treatment outcomes in a 2D photograph space.

Abstract

Orthodontics focuses on rectifying misaligned teeth (i.e., malocclusions), affecting both masticatory function and aesthetics. However, orthodontic treatment often involves complex, lengthy procedures. As such, generating a 2D photograph depicting aligned teeth prior to orthodontic treatment is crucial for effective dentist-patient communication and, more importantly, for encouraging patients to accept orthodontic intervention. In this paper, we propose a 3D structure-guided tooth alignment network that takes 2D photographs as input (e.g., photos captured by smartphones) and aligns the teeth within the 2D image space to generate an orthodontic comparison photograph featuring aesthetically pleasing, aligned teeth. Notably, while the process operates within a 2D image space, our method employs 3D intra-oral scanning models collected in clinics to learn about orthodontic treatment, i.e., projecting the pre- and post-orthodontic 3D tooth structures onto 2D tooth contours, followed by a diffusion model to learn the mapping relationship. Ultimately, the aligned tooth contours are leveraged to guide the generation of a 2D photograph with aesthetically pleasing, aligned teeth and realistic textures. We evaluate our network on various facial photographs, demonstrating its exceptional performance and strong applicability within the orthodontic industry.

3D Structure-guided Network for Tooth Alignment in 2D Photograph

TL;DR

This work tackles generating orthodontic comparison photographs from 2D images by leveraging 3D intra-oral models to learn clinically-grounded tooth alignment. It introduces a three-module pipeline (Segm-Mod, Align-Mod, Gen-Mod) that segments teeth, renders 3D-to-2D guidance, and uses diffusion models to transform tooth contours and synthesize realistic-mouth images with consistent texture and lighting. The approach demonstrates superior performance over GAN baselines, validated through ablations and a user study, and shows practical potential for enhancing dentist-patient communication. By incorporating 3D structural knowledge without requiring persistent 3D input at inference, the method offers a clinically relevant, user-friendly solution for visualizing post-treatment outcomes in a 2D photograph space.

Abstract

Orthodontics focuses on rectifying misaligned teeth (i.e., malocclusions), affecting both masticatory function and aesthetics. However, orthodontic treatment often involves complex, lengthy procedures. As such, generating a 2D photograph depicting aligned teeth prior to orthodontic treatment is crucial for effective dentist-patient communication and, more importantly, for encouraging patients to accept orthodontic intervention. In this paper, we propose a 3D structure-guided tooth alignment network that takes 2D photographs as input (e.g., photos captured by smartphones) and aligns the teeth within the 2D image space to generate an orthodontic comparison photograph featuring aesthetically pleasing, aligned teeth. Notably, while the process operates within a 2D image space, our method employs 3D intra-oral scanning models collected in clinics to learn about orthodontic treatment, i.e., projecting the pre- and post-orthodontic 3D tooth structures onto 2D tooth contours, followed by a diffusion model to learn the mapping relationship. Ultimately, the aligned tooth contours are leveraged to guide the generation of a 2D photograph with aesthetically pleasing, aligned teeth and realistic textures. We evaluate our network on various facial photographs, demonstrating its exceptional performance and strong applicability within the orthodontic industry.
Paper Structure (17 sections, 7 equations, 6 figures, 3 tables)

This paper contains 17 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Orthodontic comparison photographs. For each case, we show the facial photograph with misaligned teeth (left) and the facial photograph with well-aligned teeth generated by our network (right), and the image in the lower right corner is a zoom-in of mouth region.
  • Figure 2: Overall pipeline. When a facial photograph is input into our network, it first goes through $Segm\text{-}Mod$ to obtain oral mask, mouth region and tooth contours. Then it enters Pre-trained $Align\text{-}Mod$ to predict well-aligned tooth contours, and finally goes through $Gen\text{-}Mod$ to generate a facial photograph with well-aligned teeth.
  • Figure 2: Four groups of ablation experiments.
  • Figure 3: Inference process. For each detected mouth region $R_i$ (a), we segment to obtain the oral mask $M_i$ (b) and oral region (c). We further obtain tooth contours $C_i$ (d) from our $Segm\text{-}Mod$ and input it into our $Align\text{-}Mod$ to yield well-aligned tooth contours $\hat{C_i}$ (e). We finally predict a mouth region with well-aligned teeth $\hat{R_i}$ (f) through our $Gen\text{-}Mod$.
  • Figure 4: Qualitative comparisons. The upper two rows are two testing cases of $Align\text{-}Mod$, and the lower two rows are two testing cases of $Gen\text{-}Mod$.
  • ...and 1 more figures