OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

Yuxuan Fan; Jing Hao; Hong Chen; Jiahao Bao; Yihua Shao; Yuci Liang; Kuo Feng Hung; Hao Tang

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

Yuxuan Fan, Jing Hao, Hong Chen, Jiahao Bao, Yihua Shao, Yuci Liang, Kuo Feng Hung, Hao Tang

TL;DR

OralGPT-Plus, an agentic vision-language model designed to perform iterative and symmetry-aware diagnostic reasoning for panoramic dental radiograph analysis, is introduced and demonstrates consistent and reliable improvements over strong baselines on MMOral-X and established panoramic benchmarks, indicating the effectiveness of interactive and symmetry-informed reasoning.

Abstract

Panoramic dental radiographs require fine-grained spatial reasoning, bilateral symmetry understanding, and multi-step diagnostic verification, yet existing vision-language models operate under a static single-pass paradigm that limits their clinical reliability. In this paper, we introduce OralGPT-Plus, an agentic vision-language model designed to perform iterative and symmetry-aware diagnostic reasoning for panoramic dental radiograph analysis. To support this paradigm, we construct DentalProbe, a five-thousand-image dataset with expert-curated diagnostic trajectories that provide structured supervision for localized inspection and contralateral comparison. We further develop a Reinspection-driven reinforcement learning framework that encourages clinically meaningful re-examination and stabilizes long-horizon reasoning with rubric-based reward and conditioned diagnostic-driven reward. In parallel, we present MMOral-X, the first benchmark for holistic panoramic diagnosis, containing 300 open-ended questions and region-level annotations across multiple difficulty levels. OralGPT-Plus demonstrates consistent and reliable improvements over strong baselines on MMOral-X and established panoramic benchmarks, indicating the effectiveness of interactive and symmetry-informed reasoning. Our work highlights the value of agentic modeling for dental imaging and provides a foundation for future research in clinically aligned panoramic radiograph analysis.

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

TL;DR

Abstract

Paper Structure (45 sections, 14 equations, 24 figures, 10 tables)

This paper contains 45 sections, 14 equations, 24 figures, 10 tables.

Introduction
Preliminary
OralGPT-Plus
Dentist-like Instruction Tuning
Data Statistics
Dental-Aware Tool Design
DentalProbe Trajectory Construction
Training Strategy
Reinspection-Driven Reinforcement Learning
Rubrics-based Reward
Conditioned Diagnostic-Driven Reward
Hybrid Reward System
Optimization Objective
MMOral-X Benchmark
Benchmark Composition
...and 30 more sections

Figures (24)

Figure 1: Paradigm shift in panoramic dental radiograph analysis. Traditional detectors provide only category-level boxes, while VLMs generate coarse descriptions without structured reasoning. In contrast, OralGPT-Plus performs dentist-like diagnostic reasoning by invoking zoom-in and mirror-in tools to identify subtle findings and produce clinically coherent interpretations.
Figure 2: Curation process of DentalProbe dataset with expert trajectory for dentist-like instruction tuning.
Figure 3: Average score under dentist sampling evaluation on curated DentalProbe dataset.
Figure 4: Our training pipeline from dentist-like instruction tuning to re-inspection reinforcement learning.
Figure 5: Diagnostic trajectory generated by OralGPT-Plus on a sample of MMOral-X. The model produce a final panoramic-level diagnosis and got a score 0.9.
...and 19 more figures

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

TL;DR

Abstract

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (24)