Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation
Bernardo Silva, Jefferson Fontinele, Carolina Letícia Zilli Vieira, João Manuel R. S. Tavares, Patricia Ramos Cury, Luciano Oliveira
TL;DR
This paper tackles the challenge of limited labeled data for automatic dental radiograph analysis by introducing a semi‑supervised framework that combines large language model–driven annotation from textual dental reports, masked autoencoder pretraining, and Vision Transformer classifiers to detect thirteen dental conditions on panoramic radiographs. A large multi‑source dataset (TRPR, RPR, O2PR) supports tooth‑level cropping via an instance segmentation model, with GPT‑4 extracting noun phrases from reports to form condition labels linked to teeth. Pretraining on tooth crops with MAE and subsequent ViT binary classifiers yields MCC gains, with Crops‑based pretraining outperforming ImageNet baselines; expert consensus ground truth further clarifies system performance relative to dentistry professionals. The results underscore the value of semi‑supervised/self‑supervised approaches and robust ground‑truth labeling for scalable dental diagnostics, while highlighting the need for even larger and more consistently annotated datasets to reach higher generalizability.
Abstract
Dental panoramic radiographs offer vast diagnostic opportunities, but training supervised deep learning networks for automatic analysis of those radiology images is hampered by a shortage of labeled data. Here, a different perspective on this problem is introduced. A semi-supervised learning framework is proposed to classify thirteen dental conditions on panoramic radiographs, with a particular emphasis on teeth. Large language models were explored to annotate the most common dental conditions based on dental reports. Additionally, a masked autoencoder was employed to pre-train the classification neural network, and a Vision Transformer was used to leverage the unlabeled data. The analyses were validated using two of the most extensive datasets in the literature, comprising 8,795 panoramic radiographs and 8,029 paired reports and images. Encouragingly, the results consistently met or surpassed the baseline metrics for the Matthews correlation coefficient. A comparison of the proposed solution with human practitioners, supported by statistical analysis, highlighted its effectiveness and performance limitations; based on the degree of agreement among specialists, the solution demonstrated an accuracy level comparable to that of a junior specialist.
