Table of Contents
Fetching ...

Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation

Sangyun Shin, Kaichen Zhou, Madhu Vankadari, Andrew Markham, Niki Trigoni

TL;DR

The paper addresses the limitations of coarse-to-fine 3D instance segmentation, notably AABB overestimation and error propagation during refinement. It proposes Spherical Mask, which represents each instance as a 3D polygon in spherical coordinates with a center and sector rays, and refines predictions via Radial Instance Detection and Radial Point Migration using radial deltas along rays. The approach introduces two margin-based losses to correct misclassifications and promote sector cohesion, enabling robust refinement across all foreground points. Experiments on ScanNetV2, S3DIS, and STPLS3D demonstrate state-of-the-art performance and strong generalization, highlighting the effectiveness of the spherical representation for precise 3D instance masks and resilience to false positives/negatives.

Abstract

Coarse-to-fine 3D instance segmentation methods show weak performances compared to recent Grouping-based, Kernel-based and Transformer-based methods. We argue that this is due to two limitations: 1) Instance size overestimation by axis-aligned bounding box(AABB) 2) False negative error accumulation from inaccurate box to the refinement phase. In this work, we introduce Spherical Mask, a novel coarse-to-fine approach based on spherical representation, overcoming those two limitations with several benefits. Specifically, our coarse detection estimates each instance with a 3D polygon using a center and radial distance predictions, which avoids excessive size estimation of AABB. To cut the error propagation in the existing coarse-to-fine approaches, we virtually migrate points based on the polygon, allowing all foreground points, including false negatives, to be refined. During inference, the proposal and point migration modules run in parallel and are assembled to form binary masks of instances. We also introduce two margin-based losses for the point migration to enforce corrections for the false positives/negatives and cohesion of foreground points, significantly improving the performance. Experimental results from three datasets, such as ScanNetV2, S3DIS, and STPLS3D, show that our proposed method outperforms existing works, demonstrating the effectiveness of the new instance representation with spherical coordinates. The code is available at: https://github.com/yunshin/SphericalMask

Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation

TL;DR

The paper addresses the limitations of coarse-to-fine 3D instance segmentation, notably AABB overestimation and error propagation during refinement. It proposes Spherical Mask, which represents each instance as a 3D polygon in spherical coordinates with a center and sector rays, and refines predictions via Radial Instance Detection and Radial Point Migration using radial deltas along rays. The approach introduces two margin-based losses to correct misclassifications and promote sector cohesion, enabling robust refinement across all foreground points. Experiments on ScanNetV2, S3DIS, and STPLS3D demonstrate state-of-the-art performance and strong generalization, highlighting the effectiveness of the spherical representation for precise 3D instance masks and resilience to false positives/negatives.

Abstract

Coarse-to-fine 3D instance segmentation methods show weak performances compared to recent Grouping-based, Kernel-based and Transformer-based methods. We argue that this is due to two limitations: 1) Instance size overestimation by axis-aligned bounding box(AABB) 2) False negative error accumulation from inaccurate box to the refinement phase. In this work, we introduce Spherical Mask, a novel coarse-to-fine approach based on spherical representation, overcoming those two limitations with several benefits. Specifically, our coarse detection estimates each instance with a 3D polygon using a center and radial distance predictions, which avoids excessive size estimation of AABB. To cut the error propagation in the existing coarse-to-fine approaches, we virtually migrate points based on the polygon, allowing all foreground points, including false negatives, to be refined. During inference, the proposal and point migration modules run in parallel and are assembled to form binary masks of instances. We also introduce two margin-based losses for the point migration to enforce corrections for the false positives/negatives and cohesion of foreground points, significantly improving the performance. Experimental results from three datasets, such as ScanNetV2, S3DIS, and STPLS3D, show that our proposed method outperforms existing works, demonstrating the effectiveness of the new instance representation with spherical coordinates. The code is available at: https://github.com/yunshin/SphericalMask
Paper Structure (22 sections, 15 equations, 6 figures, 7 tables)

This paper contains 22 sections, 15 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Pipeline of Spherical Mask with coarse-to-fine framework. Given point cloud, instances are detected with 3D polygons defined in spherical coordinates. In the refinement phase, the points virtually migrate based on the polygon to estimate fine instance masks.
  • Figure 2: Overall pipeline of our proposed method based on coarse to fine approach. Given the point cloud, the backbone produces base features with 3D UNet and Voting module. Based on this, RID performs coarse detection while RPM produces the virtual point offsets to refine the coarse detection. In Mask Assembly, $K$ local binary masks are generated, where each mask is a proposal for a single instance. 3D NMS is applied to acquire the final instance masks using local binary masks, classifications, and confidence scores.
  • Figure 3: Process of RID. (a) Object points in cartesian coordinates (b) Converting points into a spherical coordinate system, using $f_{\text{center}}$, and preset angles $\theta$ and $\varphi$. (c) Assigning points to each sector defined by $\theta$ and $\varphi$. The example shows 3/3 for $\theta/\varphi$. (d) For each sector, the distance between the farthest point and the center becomes the target of $f_{\text{ray}}$. During inference, points with smaller distance than $f_{\text{ray}}$ are considered foreground.
  • Figure 4: Conceptual diagram showing per-point migration following both (a)$L_{\text{mc}}$ and (b)$L_{\text{sc}}$. $\Delta_{\text{FP}}$ and $\Delta_{\text{FN}}$ are distances penalized by $L_{\text{mc}}$ with margin for misclassified points. $\Delta_{\text{TP}}$ is the distance that $L_{\text{sc}}$ penalizes to enforce the learning of general features of an instance by making each sample close to the other around the center.
  • Figure 5: Qualitative comparison of ISBNetngo2023isbnet, MAFTLai_2023_ICCV, and ours on ScanNetV2 validation set.
  • ...and 1 more figures