Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Ayca Takmaz, Alexandros Delitzas, Robert W. Sumner, Francis Engelmann, Johanna Wald, Federico Tombari
TL;DR
Search3D addresses the need for open-vocabulary 3D segmentation that goes beyond object-centric queries by proposing a hierarchical scene representation that jointly encodes objects and their parts. It builds a tree-structured scene graph from posed RGB-D data, uses geometric over-segmentation for parts, and embeds open-vocabulary features at multiple levels via Semantic-SAM and SigLIP, enabling text-driven search across objects, parts, and attributes. The paper introduces new scene-scale benchmarks on MultiScan and fine-grained annotations on ScanNet++ to evaluate part-level open-vocabulary segmentation, showing significant gains over baselines in 3D part, object instance, and material segmentation, and offering practical runtime characteristics for inference. This approach advances flexible 3D scene understanding with tangible implications for robotics and interactive AI in unknown environments, where user-defined textual queries require robust part-level and attribute-level reasoning.
Abstract
Open-vocabulary 3D segmentation enables exploration of 3D spaces using free-form text descriptions. Existing methods for open-vocabulary 3D instance segmentation primarily focus on identifying object-level instances but struggle with finer-grained scene entities such as object parts, or regions described by generic attributes. In this work, we introduce Search3D, an approach to construct hierarchical open-vocabulary 3D scene representations, enabling 3D search at multiple levels of granularity: fine-grained object parts, entire objects, or regions described by attributes like materials. Unlike prior methods, Search3D shifts towards a more flexible open-vocabulary 3D search paradigm, moving beyond explicit object-centric queries. For systematic evaluation, we further contribute a scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan, along with a set of open-vocabulary fine-grained part annotations on ScanNet++. Search3D outperforms baselines in scene-scale open-vocabulary 3D part segmentation, while maintaining strong performance in segmenting 3D objects and materials. Our project page is http://search3d-segmentation.github.io.
