Table of Contents
Fetching ...

FreePoint: Unsupervised Point Cloud Instance Segmentation

Zhikai Zhang, Jian Ding, Li Jiang, Dengxin Dai, Gui-Song Xia

TL;DR

FreePoint tackles the challenge of unsupervised class-agnostic instance segmentation on indoor 3D point clouds by integrating plane-based background removal, multi-feature point representations (coordinates, colors, and self-supervised embeddings), and a bottom-up RAMA multicut strategy to generate pseudo masks. It stabilizes pseudo-labels with an id-as-feature ensemble across multiple RAMA runs and trains a 3D instance segmenter using a carefully crafted weakly-supervised two-step Loss that includes center and bounding-box cues alongside Dice/BCE terms. The method achieves state-of-the-art results among unsupervised approaches, outperforming traditional clustering by over 18.2% AP and surpassing UnScene3D by 5.5% AP on ScanNet, while also delivering strong unsupervised pretraining benefits for downstream semantic instance segmentation with limited annotations (e.g., +6.0% AP with 10% masks on S3DIS). This work demonstrates that a purely 3D, self-supervised pipeline with a 3D-tailored bottom-up segmentation strategy can provide strong performance and practical pretraining benefits for robotics and 3D vision tasks.

Abstract

Instance segmentation of point clouds is a crucial task in 3D field with numerous applications that involve localizing and segmenting objects in a scene. However, achieving satisfactory results requires a large number of manual annotations, which is a time-consuming and expensive process. To alleviate dependency on annotations, we propose a novel framework, FreePoint, for underexplored unsupervised class-agnostic instance segmentation on point clouds. In detail, we represent the point features by combining coordinates, colors, and self-supervised deep features. Based on the point features, we perform a bottom-up multicut algorithm to segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model. We propose an id-as-feature strategy at this stage to alleviate the randomness of the multicut algorithm and improve the pseudo labels' quality. During training, we propose a weakly-supervised two-step training strategy and corresponding losses to overcome the inaccuracy of coarse masks. FreePoint has achieved breakthroughs in unsupervised class-agnostic instance segmentation on point clouds and outperformed previous traditional methods by over 18.2% and a competitive concurrent work UnScene3D by 5.5% in AP. Additionally, when used as a pretext task and fine-tuned on S3DIS, FreePoint performs significantly better than existing self-supervised pre-training methods with limited annotations and surpasses CSC by 6.0% in AP with 10% annotation masks.

FreePoint: Unsupervised Point Cloud Instance Segmentation

TL;DR

FreePoint tackles the challenge of unsupervised class-agnostic instance segmentation on indoor 3D point clouds by integrating plane-based background removal, multi-feature point representations (coordinates, colors, and self-supervised embeddings), and a bottom-up RAMA multicut strategy to generate pseudo masks. It stabilizes pseudo-labels with an id-as-feature ensemble across multiple RAMA runs and trains a 3D instance segmenter using a carefully crafted weakly-supervised two-step Loss that includes center and bounding-box cues alongside Dice/BCE terms. The method achieves state-of-the-art results among unsupervised approaches, outperforming traditional clustering by over 18.2% AP and surpassing UnScene3D by 5.5% AP on ScanNet, while also delivering strong unsupervised pretraining benefits for downstream semantic instance segmentation with limited annotations (e.g., +6.0% AP with 10% masks on S3DIS). This work demonstrates that a purely 3D, self-supervised pipeline with a 3D-tailored bottom-up segmentation strategy can provide strong performance and practical pretraining benefits for robotics and 3D vision tasks.

Abstract

Instance segmentation of point clouds is a crucial task in 3D field with numerous applications that involve localizing and segmenting objects in a scene. However, achieving satisfactory results requires a large number of manual annotations, which is a time-consuming and expensive process. To alleviate dependency on annotations, we propose a novel framework, FreePoint, for underexplored unsupervised class-agnostic instance segmentation on point clouds. In detail, we represent the point features by combining coordinates, colors, and self-supervised deep features. Based on the point features, we perform a bottom-up multicut algorithm to segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model. We propose an id-as-feature strategy at this stage to alleviate the randomness of the multicut algorithm and improve the pseudo labels' quality. During training, we propose a weakly-supervised two-step training strategy and corresponding losses to overcome the inaccuracy of coarse masks. FreePoint has achieved breakthroughs in unsupervised class-agnostic instance segmentation on point clouds and outperformed previous traditional methods by over 18.2% and a competitive concurrent work UnScene3D by 5.5% in AP. Additionally, when used as a pretext task and fine-tuned on S3DIS, FreePoint performs significantly better than existing self-supervised pre-training methods with limited annotations and surpasses CSC by 6.0% in AP with 10% annotation masks.
Paper Structure (29 sections, 8 equations, 5 figures, 7 tables)

This paper contains 29 sections, 8 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: We propose a novel framework for unsupervised point cloud instance segmentation. In detail, we cluster points based on coordinates, colors, and self-supervised deep features. Then we use the clustered pseudo masks to perform a step-training and improve the unsupervised segmentation quality further.
  • Figure 2: Overview. For inputted point clouds, we first use plane segmentation to filter out backgrounds. Then we represent the features for points by combining self-supervised deep features and traditional features. After that, we construct a graph and compute the edge affinity costs between points. Based on the graph, we apply a multicut algorithm to segment point clouds into coarse instance masks. These masks are adopted as pseudo labels to train a 3D instance segmentation model with our proposed weakly-supervised loss and step-training strategy.
  • Figure 3: Pseudo-label Generation. In this figure, we show the complete pipeline of pseudo-label generation. For simplicity, we set $k_{1} = k_{2} = 2$.
  • Figure 4: Qualitative results on ScanNet. FreePoint shows surprisingly good performance without any annotations.
  • Figure 5: Comparison with segmenting methods originally for 2D unsupervised instance segmentation. Recent methods wang2022freesolomelas2022deep for 2D unsupervised instance segmentation fail to deal with crowded and cluttered point cloud scenes due to their top-down mechanism.