Table of Contents
Fetching ...

Joint prototype and coefficient prediction for 3D instance segmentation

Remco Royen, Leon Denis, Adrian Munteanu

TL;DR

This work tackles 3D instance segmentation in point clouds by introducing a prototype-based approach that jointly learns coefficients and prototypes through overcomplete sampling, producing an exhaustive set of candidate masks. Instance predictions are obtained via a linear combination of coefficients and prototypes, then refined with an efficient NMS, avoiding precise proposal generation. Evaluations on S3DIS-blocks show state-of-the-art performance with significant speed improvements (35.7 ms per block) and substantially reduced timing variance, making the method attractive for online and embedded deployment. The approach also yields interpretable prototypes that highlight meaningful regions in the input, supporting practical scene understanding tasks.

Abstract

3D instance segmentation is crucial for applications demanding comprehensive 3D scene understanding. In this paper, we introduce a novel method that simultaneously learns coefficients and prototypes. Employing an overcomplete sampling strategy, our method produces an overcomplete set of instance predictions, from which the optimal ones are selected through a Non-Maximum Suppression (NMS) algorithm during inference. The obtained prototypes are visualizable and interpretable. Our method demonstrates superior performance on S3DIS-blocks, consistently outperforming existing methods in mRec and mPrec. Moreover, it operates 32.9% faster than the state-of-the-art. Notably, with only 0.8% of the total inference time, our method exhibits an over 20-fold reduction in the variance of inference time compared to existing methods. These attributes render our method well-suited for practical applications requiring both rapid inference and high reliability.

Joint prototype and coefficient prediction for 3D instance segmentation

TL;DR

This work tackles 3D instance segmentation in point clouds by introducing a prototype-based approach that jointly learns coefficients and prototypes through overcomplete sampling, producing an exhaustive set of candidate masks. Instance predictions are obtained via a linear combination of coefficients and prototypes, then refined with an efficient NMS, avoiding precise proposal generation. Evaluations on S3DIS-blocks show state-of-the-art performance with significant speed improvements (35.7 ms per block) and substantially reduced timing variance, making the method attractive for online and embedded deployment. The approach also yields interpretable prototypes that highlight meaningful regions in the input, supporting practical scene understanding tasks.

Abstract

3D instance segmentation is crucial for applications demanding comprehensive 3D scene understanding. In this paper, we introduce a novel method that simultaneously learns coefficients and prototypes. Employing an overcomplete sampling strategy, our method produces an overcomplete set of instance predictions, from which the optimal ones are selected through a Non-Maximum Suppression (NMS) algorithm during inference. The obtained prototypes are visualizable and interpretable. Our method demonstrates superior performance on S3DIS-blocks, consistently outperforming existing methods in mRec and mPrec. Moreover, it operates 32.9% faster than the state-of-the-art. Notably, with only 0.8% of the total inference time, our method exhibits an over 20-fold reduction in the variance of inference time compared to existing methods. These attributes render our method well-suited for practical applications requiring both rapid inference and high reliability.
Paper Structure (8 sections, 2 equations, 5 figures, 1 table)

This paper contains 8 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Speed-performance comparison on S3DIS-blocks Area-5. The proposed method outperforms the state-of-the-art in terms of accuracy, speed and variance in inference time.
  • Figure 2: The proposed architecture consists of four main parts: (1) A feature extractor which retrieves per-point features. (2) The sampling of a diverse set of $K$ points (3) A PointNet++ network that generates prototypes from the per-point features (4) In parallel, a PointConv network that computes coefficients for each sampled point. Instance predictions are obtained by linearly combining the coefficients and prototypes from which the optimal ones are selected during inference.
  • Figure 3: Prototype $p_i$ activation for different PartNet samples $x_i$.
  • Figure 4: Peak memory usage of the proposed method (block-based) and PointGroup jiang2020pointgroup (full-scene) for different number of input points and number of instances (intensity of marker). The experiment is conducted on the scenes of S3DIS Area-5 armeni20163d
  • Figure 5: Visualization of S3DIS results. Note that different colors represent different instances and that the same instance may have a different color in the ground-truth and prediction