PartSLIP++: Enhancing Low-Shot 3D Part Segmentation via Multi-View Instance Segmentation and Maximum Likelihood Estimation
Yuchen Zhou, Jiayuan Gu, Xuanlin Li, Minghua Liu, Yunhao Fang, Hao Su
TL;DR
PartSLIP++ tackles open-world, low-shot 3D part segmentation by replacing coarse 2D prompts with pixel-perfect SAM segmentations and reframing 3D lifting as a maximum-likelihood EM problem that refines 3D instance masks via multi-view 2D-3D matching and gradient-based optimization. The method achieves consistent gains over PartSLIP in both semantic and instance-based 3D part segmentation on PartNet-E, with ablations validating the contributions of 2D segmentation refinement, EM-based lifting, and post-processing. It also demonstrates practical utility in semi-automatic 3D part annotation and class-agnostic 3D instance proposal generation, highlighting the approach's applicability to real-world robotics and AR/VR tasks.
Abstract
Open-world 3D part segmentation is pivotal in diverse applications such as robotics and AR/VR. Traditional supervised methods often grapple with limited 3D data availability and struggle to generalize to unseen object categories. PartSLIP, a recent advancement, has made significant strides in zero- and few-shot 3D part segmentation. This is achieved by harnessing the capabilities of the 2D open-vocabulary detection module, GLIP, and introducing a heuristic method for converting and lifting multi-view 2D bounding box predictions into 3D segmentation masks. In this paper, we introduce PartSLIP++, an enhanced version designed to overcome the limitations of its predecessor. Our approach incorporates two major improvements. First, we utilize a pre-trained 2D segmentation model, SAM, to produce pixel-wise 2D segmentations, yielding more precise and accurate annotations than the 2D bounding boxes used in PartSLIP. Second, PartSLIP++ replaces the heuristic 3D conversion process with an innovative modified Expectation-Maximization algorithm. This algorithm conceptualizes 3D instance segmentation as unobserved latent variables, and then iteratively refines them through an alternating process of 2D-3D matching and optimization with gradient descent. Through extensive evaluations, we show that PartSLIP++ demonstrates better performance over PartSLIP in both low-shot 3D semantic and instance-based object part segmentation tasks. Code released at https://github.com/zyc00/PartSLIP2.
