Table of Contents
Fetching ...

OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

Feihong Shen, JIngjing Liu, Jianwen Lou, Haizhen Li, Bing Fang, Chenglong Ma, Jin Hao, Yang Feng, Youyi Zheng

TL;DR

OrthoGAN tackles the challenge of visualizing orthodontic treatment outcomes on frontal images by anchoring tooth movement to a patient-specific 3D teeth model. It combines a differentiable pose-fitting stage with a StyleGAN-based conditional generator to produce identity-preserving, natural-looking post-treatment smiles, guided by both geometry and appearance cues. The key contributions include a robust edge-based differentiable pose fitting, a 7-channel multi-modal input design and a 16x16 latent space, plus a post-processing step that blends lips and teeth for realism, validated through qualitative, quantitative, and clinical studies. The approach offers a practical tool for patient communication in digital orthodontics and suggests avenues for extending to dynamic facial changes during treatment.

Abstract

Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method.

OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

TL;DR

OrthoGAN tackles the challenge of visualizing orthodontic treatment outcomes on frontal images by anchoring tooth movement to a patient-specific 3D teeth model. It combines a differentiable pose-fitting stage with a StyleGAN-based conditional generator to produce identity-preserving, natural-looking post-treatment smiles, guided by both geometry and appearance cues. The key contributions include a robust edge-based differentiable pose fitting, a 7-channel multi-modal input design and a 16x16 latent space, plus a post-processing step that blends lips and teeth for realism, validated through qualitative, quantitative, and clinical studies. The approach offers a practical tool for patient communication in digital orthodontics and suggests avenues for extending to dynamic facial changes during treatment.

Abstract

Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method.
Paper Structure (10 sections, 3 equations, 6 figures, 3 tables)

This paper contains 10 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The result from popular diffusion-based text-to-image softwares. We set the same prompt and provide the same input if the software contains the inpainting or img2img function.
  • Figure 2: Overview of our framework.$T_0$ is the initial scanned model of patients' teeth. The parameters used to render the following $T_i$ derive from the result of the differential rendering of $T_0$. The orange arrows denote the optimization of camera parameters in the fitting stage.
  • Figure 3: The architecture of our OrthoGAN. (a) The StyleBlock is based on the style generation structure with modulate and demodulate layers. The ResBlock and part of the StyleBlock constitute a UNet-liked structure. All ResBlock formed the encoder $\mathcal{E}$ and all StyleBlock formed the decoder $\mathcal{D}$. (b) The inner structure of ResBlock and StyleBlock. The middle arrows denote the transfer of the feature maps between two corresponding blocks. For simplicity, learned weights and noises of style mechanism are omitted.
  • Figure 4: Dynamic inpainting results of OrthoGAN. We selected several treatment alignment visualizations with steps from $\mathcal{T}_\text{0}$ to $\mathcal{T}_\text{16}$, and all the alignment simulation results can form a video to illustrate the movement of the teeth.
  • Figure 5: Comparison on origin teeth reconstruction and new alignment visualization. The first column is the original teeth image. The second, third, and fourth column is the reconstruction result from StyleEX, TSynNet, and our OrthoGAN. The fifth column is the $\mathcal{T}_{i}$ and the sixth column is the rendered depth image of corresponding teeth. The seventh, eighth, and ninth column is the alignment result from StyleEX, TSynNet, and our OrthoGAN.
  • ...and 1 more figures