Table of Contents
Fetching ...

OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

Jing Hao, Yuci Liang, Lizhuo Lin, Yuxuan Fan, Wenkai Zhou, Kaixin Guo, Zanting Ye, Yanpeng Sun, Xinyu Zhang, Yanqi Yang, Qiankun Li, Hao Tang, James Kit-Hon Tsoi, Linlin Shen, Kuo Feng Hung

TL;DR

OralGPT-Omni introduces a dental-specialized multimodal LLM augmented by TRACE-CoT reasoning and a four-stage training paradigm to deliver transparent, reliable dental image analysis across eight modalities and five tasks. A first-of-its-kind MMOral-Uni benchmark provides a comprehensive evaluation platform, enabling rigorous comparison against diverse LVLMs. Empirical results show OralGPT-Omni achieving state-of-the-art performance on MMOral-Uni and MMOral-OPG, with ablations confirming the value of explicit reasoning and reinforcement learning tuning. The work also outlines a data pipeline, evaluation framework, and clinical validation that collectively advance intelligent dentistry and set benchmarks for future research.

Abstract

Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.

OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

TL;DR

OralGPT-Omni introduces a dental-specialized multimodal LLM augmented by TRACE-CoT reasoning and a four-stage training paradigm to deliver transparent, reliable dental image analysis across eight modalities and five tasks. A first-of-its-kind MMOral-Uni benchmark provides a comprehensive evaluation platform, enabling rigorous comparison against diverse LVLMs. Empirical results show OralGPT-Omni achieving state-of-the-art performance on MMOral-Uni and MMOral-OPG, with ablations confirming the value of explicit reasoning and reinforcement learning tuning. The work also outlines a data pipeline, evaluation framework, and clinical validation that collectively advance intelligent dentistry and set benchmarks for future research.

Abstract

Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.

Paper Structure

This paper contains 23 sections, 5 equations, 36 figures, 9 tables.

Figures (36)

  • Figure 1: Overview of diverse dental-specialized corpus. (a) Eight types of widely used dental imaging modalities. (b) Introduction of our proposed TRACE-CoT reasoning pattern that enhances the reliability of MLLM's response. (c) The composition of the training corpus for OralGPT-Omni. The bar chart shows the distribution of various dental modalities.
  • Figure 2: (a) The dental imaging data curation and TRACE-CoT data generation pipeline. It involves curating diverse imaging modalities from public datasets and dental hospitals. TRACE-CoT data is then generated using GPT, Wikipedia, and various annotations. Finally, the data is split into a training set and a benchmark, with professional dentists assessing the training samples and a thorough manual correction conducted on the benchmark. (b) Results from two dentists evaluating the quality of 300 TRACE-CoT data from the training set.
  • Figure 3: There are four stages for training OralGPT-Omni, and only the first stage is training in the single modality.
  • Figure 4: The difficulty-aware data selection strategy for RLT.
  • Figure 5: (a) The distribution of the MMOral-Uni benchmark, spanning five dental imaging modalities and covering five tasks. (b) Performance comparison on the MMOral-Uni benchmark. (c) Performance comparison on the MMOral-OPG benchmark.
  • ...and 31 more figures