TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents

Shaojie Zhuang; Lu Yin; Guangshun Wei; Yunpeng Li; Xilu Wang; Yuanfeng Zhou

TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents

Shaojie Zhuang, Lu Yin, Guangshun Wei, Yunpeng Li, Xilu Wang, Yuanfeng Zhou

Abstract

Automatic tooth segmentation and identification from intra-oral scanned 3D models are fundamental problems in digital dentistry, yet most existing approaches rely on task-specific 3D neural networks trained with densely annotated datasets, resulting in high annotation cost and limited generalization to scans from unseen sources. Thus, we propose TSegAgent, which addresses these challenges by reformulating dental analysis as a zero-shot geometric reasoning problem rather than a purely data-driven recognition task. The key idea is to combine the representational capacity of general-purpose foundation models with explicit geometric inductive biases derived from dental anatomy. Instead of learning dental-specific features, the proposed framework leverages multi-view visual abstraction and geometry-grounded reasoning to infer tooth instances and identities without task-specific training. By explicitly encoding structural constraints such as dental arch organization and volumetric relationships, the method reduces uncertainty in ambiguous cases and mitigates overfitting to particular shape distributions. Experimental results demonstrate that this reasoning-oriented formulation enables accurate and reliable tooth segmentation and identification with low computational and annotation cost, while exhibiting strong generalization across diverse and previously unseen dental scans.

TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents

Abstract

Paper Structure (17 sections, 1 equation, 3 figures, 2 tables)

This paper contains 17 sections, 1 equation, 3 figures, 2 tables.

Introduction
Method
Overview
Tooth Instance Segmentation by Multi-View Images
Tooth Identification Agent
Instance ID Reordering.
Multi-round Conversation-based Tooth Identification Agent.
Non-tooth Region Identification.
Central Incisor Identification.
Full-Arch Tooth Classification.
Error Detection and Correction.
Experiments
Dataset and Settings
Metrics
Comparison
...and 2 more sections

Figures (3)

Figure 1: The pipeline of TSegAgent. Given an intra-oral scanned 3D model, we first perform multi-view rendering (with curvature) and apply SAM3 for zero-shot tooth instance segmentation. The resulting masks are merged into face-level instance labels, which are then reordered based on dental arch geometry. Finally, a vision-language agent identifies tooth instances by multi-round conversation and reasoning over visual cues and geometric constraints.
Figure 2: Typical challenging cases for tooth classification, including (a) non-tooth regions, (b) tooth over-segmentation into multiple instances due to occlusion or complex morphology, and (c) small gingival papilla between adjacent teeth that may be misclassified as tooth instances.
Figure 3: Qualitative results of candidate methods, from Teeth3DS and private dataset.

TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents

Abstract

TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents

Authors

Abstract

Table of Contents

Figures (3)