PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
Amrin Kareem, Jean Lahoud, Hisham Cholakkal
TL;DR
This work addresses the limitation of 3D perception systems in handling implicit user intents by proposing reasoning-based 3D part segmentation. It introduces the RPSeg3D dataset (2624 objects, 60k+ instructions) and the PARIS3D architecture, which renders multi-view images, leverages a multimodal reasoning backbone, and lifts per-view masks into a coherent 3D segmentation with explanations. PARIS3D achieves competitive performance against explicit-query baselines and demonstrates capability to identify part concepts, reason about them, and incorporate world knowledge through explanations. The dataset and framework advance interactive, language-driven 3D perception with practical implications for robotics and intelligent visualization, while leaving room to explore instance-level segmentation in future work.
Abstract
Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object. To facilitate evaluation and benchmarking, we present a large 3D dataset comprising over 60k instructions paired with corresponding ground-truth part segmentation annotations specifically curated for reasoning-based 3D part segmentation. We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations corresponding to 3D object segmentation requests. Experiments show that our method achieves competitive performance to models that use explicit queries, with the additional abilities to identify part concepts, reason about them, and complement them with world knowledge. Our source code, dataset, and trained models are available at https://github.com/AmrinKareem/PARIS3D.
