FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang, Jian Ding, Li Jiang, Dengxin Dai, Gui-Song Xia
TL;DR
FreePoint tackles the challenge of unsupervised class-agnostic instance segmentation on indoor 3D point clouds by integrating plane-based background removal, multi-feature point representations (coordinates, colors, and self-supervised embeddings), and a bottom-up RAMA multicut strategy to generate pseudo masks. It stabilizes pseudo-labels with an id-as-feature ensemble across multiple RAMA runs and trains a 3D instance segmenter using a carefully crafted weakly-supervised two-step Loss that includes center and bounding-box cues alongside Dice/BCE terms. The method achieves state-of-the-art results among unsupervised approaches, outperforming traditional clustering by over 18.2% AP and surpassing UnScene3D by 5.5% AP on ScanNet, while also delivering strong unsupervised pretraining benefits for downstream semantic instance segmentation with limited annotations (e.g., +6.0% AP with 10% masks on S3DIS). This work demonstrates that a purely 3D, self-supervised pipeline with a 3D-tailored bottom-up segmentation strategy can provide strong performance and practical pretraining benefits for robotics and 3D vision tasks.
Abstract
Instance segmentation of point clouds is a crucial task in 3D field with numerous applications that involve localizing and segmenting objects in a scene. However, achieving satisfactory results requires a large number of manual annotations, which is a time-consuming and expensive process. To alleviate dependency on annotations, we propose a novel framework, FreePoint, for underexplored unsupervised class-agnostic instance segmentation on point clouds. In detail, we represent the point features by combining coordinates, colors, and self-supervised deep features. Based on the point features, we perform a bottom-up multicut algorithm to segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model. We propose an id-as-feature strategy at this stage to alleviate the randomness of the multicut algorithm and improve the pseudo labels' quality. During training, we propose a weakly-supervised two-step training strategy and corresponding losses to overcome the inaccuracy of coarse masks. FreePoint has achieved breakthroughs in unsupervised class-agnostic instance segmentation on point clouds and outperformed previous traditional methods by over 18.2% and a competitive concurrent work UnScene3D by 5.5% in AP. Additionally, when used as a pretext task and fine-tuned on S3DIS, FreePoint performs significantly better than existing self-supervised pre-training methods with limited annotations and surpasses CSC by 6.0% in AP with 10% annotation masks.
