3D Structure-guided Network for Tooth Alignment in 2D Photograph
Yulong Dou, Lanzhuju Mei, Dinggang Shen, Zhiming Cui
TL;DR
This work tackles generating orthodontic comparison photographs from 2D images by leveraging 3D intra-oral models to learn clinically-grounded tooth alignment. It introduces a three-module pipeline (Segm-Mod, Align-Mod, Gen-Mod) that segments teeth, renders 3D-to-2D guidance, and uses diffusion models to transform tooth contours and synthesize realistic-mouth images with consistent texture and lighting. The approach demonstrates superior performance over GAN baselines, validated through ablations and a user study, and shows practical potential for enhancing dentist-patient communication. By incorporating 3D structural knowledge without requiring persistent 3D input at inference, the method offers a clinically relevant, user-friendly solution for visualizing post-treatment outcomes in a 2D photograph space.
Abstract
Orthodontics focuses on rectifying misaligned teeth (i.e., malocclusions), affecting both masticatory function and aesthetics. However, orthodontic treatment often involves complex, lengthy procedures. As such, generating a 2D photograph depicting aligned teeth prior to orthodontic treatment is crucial for effective dentist-patient communication and, more importantly, for encouraging patients to accept orthodontic intervention. In this paper, we propose a 3D structure-guided tooth alignment network that takes 2D photographs as input (e.g., photos captured by smartphones) and aligns the teeth within the 2D image space to generate an orthodontic comparison photograph featuring aesthetically pleasing, aligned teeth. Notably, while the process operates within a 2D image space, our method employs 3D intra-oral scanning models collected in clinics to learn about orthodontic treatment, i.e., projecting the pre- and post-orthodontic 3D tooth structures onto 2D tooth contours, followed by a diffusion model to learn the mapping relationship. Ultimately, the aligned tooth contours are leveraged to guide the generation of a 2D photograph with aesthetically pleasing, aligned teeth and realistic textures. We evaluate our network on various facial photographs, demonstrating its exceptional performance and strong applicability within the orthodontic industry.
