Table of Contents
Fetching ...

OPGAgent: An Agent for Auditable Dental Panoramic X-ray Interpretation

Zhaolin Yu, Litao Yang, Ben Babicka, Ming Hu, Jing Hao, Anthony Huang, James Huang, Yueming Jin, Jiasong Wu, Zongyuan Ge

TL;DR

OPGAgent, a multi-tool agentic system for auditable OPG interpretation that outperforms current dental VLMs and medical agent frameworks across both structured-report and VQA evaluation.

Abstract

Orthopantomograms (OPGs) are the standard panoramic radiograph in dentistry, used for full-arch screening across multiple diagnostic tasks. While Vision Language Models (VLMs) now allow multi-task OPG analysis through natural language, they underperform task-specific models on most individual tasks. Agentic systems that orchestrate specialized tools offer a path to both versatility and accuracy, this approach remains unexplored in the field of dental imaging. To address this gap, we propose OPGAgent, a multi-tool agentic system for auditable OPG interpretation. OPGAgent coordinates specialized perception modules with a consensus mechanism through three components: (1) a Hierarchical Evidence Gathering module that decomposes OPG analysis into global, quadrant, and tooth-level phases with dynamically invoking tools, (2) a Specialized Toolbox encapsulating spatial, detection, utility, and expert zoos, and (3) a Consensus Subagent that resolves conflicts through anatomical constraints. We further propose OPG-Bench, a structured-report protocol based on (Location, Field, Value) triples derived from real clinical reports, which enables a comprehensive review of findings and hallucinations, extending beyond the limitations of VQA indicators. On our OPG-Bench and the public MMOral-OPG benchmark, OPGAgent outperforms current dental VLMs and medical agent frameworks across both structured-report and VQA evaluation. Code will be released upon acceptance.

OPGAgent: An Agent for Auditable Dental Panoramic X-ray Interpretation

TL;DR

OPGAgent, a multi-tool agentic system for auditable OPG interpretation that outperforms current dental VLMs and medical agent frameworks across both structured-report and VQA evaluation.

Abstract

Orthopantomograms (OPGs) are the standard panoramic radiograph in dentistry, used for full-arch screening across multiple diagnostic tasks. While Vision Language Models (VLMs) now allow multi-task OPG analysis through natural language, they underperform task-specific models on most individual tasks. Agentic systems that orchestrate specialized tools offer a path to both versatility and accuracy, this approach remains unexplored in the field of dental imaging. To address this gap, we propose OPGAgent, a multi-tool agentic system for auditable OPG interpretation. OPGAgent coordinates specialized perception modules with a consensus mechanism through three components: (1) a Hierarchical Evidence Gathering module that decomposes OPG analysis into global, quadrant, and tooth-level phases with dynamically invoking tools, (2) a Specialized Toolbox encapsulating spatial, detection, utility, and expert zoos, and (3) a Consensus Subagent that resolves conflicts through anatomical constraints. We further propose OPG-Bench, a structured-report protocol based on (Location, Field, Value) triples derived from real clinical reports, which enables a comprehensive review of findings and hallucinations, extending beyond the limitations of VQA indicators. On our OPG-Bench and the public MMOral-OPG benchmark, OPGAgent outperforms current dental VLMs and medical agent frameworks across both structured-report and VQA evaluation. Code will be released upon acceptance.
Paper Structure (10 sections, 1 equation, 2 figures, 3 tables)

This paper contains 10 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of OPGAgent. The Agent orchestrates three modules: Hierarchical Evidence Gathering, Specialized Toolbox, and Consensus Subagent.
  • Figure 2: Vision comparison. OPGAgent correctly identifies missing teeth, bone loss, and impacted tooth 38 via consensus, while GPT-5.2 and Gemini-3-Flash each miss or hallucinate findings.Blue for correct; Orange for wrong; Green for all undetected.