Table of Contents
Fetching ...

Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation

Bernardo Silva, Jefferson Fontinele, Carolina Letícia Zilli Vieira, João Manuel R. S. Tavares, Patricia Ramos Cury, Luciano Oliveira

TL;DR

This paper tackles the challenge of limited labeled data for automatic dental radiograph analysis by introducing a semi‑supervised framework that combines large language model–driven annotation from textual dental reports, masked autoencoder pretraining, and Vision Transformer classifiers to detect thirteen dental conditions on panoramic radiographs. A large multi‑source dataset (TRPR, RPR, O2PR) supports tooth‑level cropping via an instance segmentation model, with GPT‑4 extracting noun phrases from reports to form condition labels linked to teeth. Pretraining on tooth crops with MAE and subsequent ViT binary classifiers yields MCC gains, with Crops‑based pretraining outperforming ImageNet baselines; expert consensus ground truth further clarifies system performance relative to dentistry professionals. The results underscore the value of semi‑supervised/self‑supervised approaches and robust ground‑truth labeling for scalable dental diagnostics, while highlighting the need for even larger and more consistently annotated datasets to reach higher generalizability.

Abstract

Dental panoramic radiographs offer vast diagnostic opportunities, but training supervised deep learning networks for automatic analysis of those radiology images is hampered by a shortage of labeled data. Here, a different perspective on this problem is introduced. A semi-supervised learning framework is proposed to classify thirteen dental conditions on panoramic radiographs, with a particular emphasis on teeth. Large language models were explored to annotate the most common dental conditions based on dental reports. Additionally, a masked autoencoder was employed to pre-train the classification neural network, and a Vision Transformer was used to leverage the unlabeled data. The analyses were validated using two of the most extensive datasets in the literature, comprising 8,795 panoramic radiographs and 8,029 paired reports and images. Encouragingly, the results consistently met or surpassed the baseline metrics for the Matthews correlation coefficient. A comparison of the proposed solution with human practitioners, supported by statistical analysis, highlighted its effectiveness and performance limitations; based on the degree of agreement among specialists, the solution demonstrated an accuracy level comparable to that of a junior specialist.

Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation

TL;DR

This paper tackles the challenge of limited labeled data for automatic dental radiograph analysis by introducing a semi‑supervised framework that combines large language model–driven annotation from textual dental reports, masked autoencoder pretraining, and Vision Transformer classifiers to detect thirteen dental conditions on panoramic radiographs. A large multi‑source dataset (TRPR, RPR, O2PR) supports tooth‑level cropping via an instance segmentation model, with GPT‑4 extracting noun phrases from reports to form condition labels linked to teeth. Pretraining on tooth crops with MAE and subsequent ViT binary classifiers yields MCC gains, with Crops‑based pretraining outperforming ImageNet baselines; expert consensus ground truth further clarifies system performance relative to dentistry professionals. The results underscore the value of semi‑supervised/self‑supervised approaches and robust ground‑truth labeling for scalable dental diagnostics, while highlighting the need for even larger and more consistently annotated datasets to reach higher generalizability.

Abstract

Dental panoramic radiographs offer vast diagnostic opportunities, but training supervised deep learning networks for automatic analysis of those radiology images is hampered by a shortage of labeled data. Here, a different perspective on this problem is introduced. A semi-supervised learning framework is proposed to classify thirteen dental conditions on panoramic radiographs, with a particular emphasis on teeth. Large language models were explored to annotate the most common dental conditions based on dental reports. Additionally, a masked autoencoder was employed to pre-train the classification neural network, and a Vision Transformer was used to leverage the unlabeled data. The analyses were validated using two of the most extensive datasets in the literature, comprising 8,795 panoramic radiographs and 8,029 paired reports and images. Encouragingly, the results consistently met or surpassed the baseline metrics for the Matthews correlation coefficient. A comparison of the proposed solution with human practitioners, supported by statistical analysis, highlighted its effectiveness and performance limitations; based on the degree of agreement among specialists, the solution demonstrated an accuracy level comparable to that of a junior specialist.

Paper Structure

This paper contains 19 sections, 3 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: Sample of a human panoramic radiograph with the oral structures identified.
  • Figure 2: Illustration of FDI notation: A two-digit system that determines each tooth (a custom color code was added to show qualitative results). Adapted from silva2023boosting.
  • Figure 3: Semi-supervised framework for classifying dental conditions. (i) Dataset construction: Combines Textual Report Panoramic Radiographs (TRPR), Raw Panoramic Radiographs (RPR), and O$^2$PR. (ii) Tooth pseudolabeling and crop generation: Uses an instance segmentation neural network to generate tooth pseudolabels on unlabeled radiographs, creating tooth crops. (iii) Classification network pretraining and label extraction: Cropped teeth and text information pretrain a model via MAE, extracting noun phrases with a large language model (LLM). Text-image linkage generates labels for binary classifiers. (iv) Classification of dental conditions: Trains a binary classifier for each dental condition.
  • Figure 4: Two tooth crop variants used in this study: The first, termed the "less context" crop, was taken from a panoramic radiograph of a tooth and measures 224$\times$224 pixels. The second, termed the "more context" crop, was resized to 224$\times$224 pixels from an original crop of size 380$\times$380 pixels. These two sets comprise the Crops dataset.
  • Figure 5: Illustration of a MAE: Selected patches from an input image are obscured, and the remaining visible patches are processed through an encoder. The obscured patches are subsequently reconstructed using a decoder from the latent space representations.
  • ...and 7 more figures