Table of Contents
Fetching ...

Sketchy Bounding-box Supervision for 3D Instance Segmentation

Qian Deng, Le Hui, Jin Xie, Jian Yang

TL;DR

This work tackles weakly supervised 3D instance segmentation with imperfect sketchy bounding boxes. It introduces Sketchy-3DIS, a framework that jointly learns an adaptive box-to-point pseudo labeler and a coarse-to-fine instance segmentator, enabling conversion of noisy box annotations into high-quality pseudo labels and refined instance masks. The method uses bilateral Hungarian matching to align pseudo-ground-truth with predicted instances and employs multi-level attention to progressively refine segmentation. Experiments on ScanNetV2 and S3DIS demonstrate state-of-the-art performance under sketchy bounding boxes and even surpass some fully supervised baselines, highlighting the practical viability of annotation-efficient 3D scene understanding.

Abstract

Bounding box supervision has gained considerable attention in weakly supervised 3D instance segmentation. While this approach alleviates the need for extensive point-level annotations, obtaining accurate bounding boxes in practical applications remains challenging. To this end, we explore the inaccurate bounding box, named sketchy bounding box, which is imitated through perturbing ground truth bounding box by adding scaling, translation, and rotation. In this paper, we propose Sketchy-3DIS, a novel weakly 3D instance segmentation framework, which jointly learns pseudo labeler and segmentator to improve the performance under the sketchy bounding-box supervisions. Specifically, we first propose an adaptive box-to-point pseudo labeler that adaptively learns to assign points located in the overlapped parts between two sketchy bounding boxes to the correct instance, resulting in compact and pure pseudo instance labels. Then, we present a coarse-to-fine instance segmentator that first predicts coarse instances from the entire point cloud and then learns fine instances based on the region of coarse instances. Finally, by using the pseudo instance labels to supervise the instance segmentator, we can gradually generate high-quality instances through joint training. Extensive experiments show that our method achieves state-of-the-art performance on both the ScanNetV2 and S3DIS benchmarks, and even outperforms several fully supervised methods using sketchy bounding boxes. Code is available at https://github.com/dengq7/Sketchy-3DIS.

Sketchy Bounding-box Supervision for 3D Instance Segmentation

TL;DR

This work tackles weakly supervised 3D instance segmentation with imperfect sketchy bounding boxes. It introduces Sketchy-3DIS, a framework that jointly learns an adaptive box-to-point pseudo labeler and a coarse-to-fine instance segmentator, enabling conversion of noisy box annotations into high-quality pseudo labels and refined instance masks. The method uses bilateral Hungarian matching to align pseudo-ground-truth with predicted instances and employs multi-level attention to progressively refine segmentation. Experiments on ScanNetV2 and S3DIS demonstrate state-of-the-art performance under sketchy bounding boxes and even surpass some fully supervised baselines, highlighting the practical viability of annotation-efficient 3D scene understanding.

Abstract

Bounding box supervision has gained considerable attention in weakly supervised 3D instance segmentation. While this approach alleviates the need for extensive point-level annotations, obtaining accurate bounding boxes in practical applications remains challenging. To this end, we explore the inaccurate bounding box, named sketchy bounding box, which is imitated through perturbing ground truth bounding box by adding scaling, translation, and rotation. In this paper, we propose Sketchy-3DIS, a novel weakly 3D instance segmentation framework, which jointly learns pseudo labeler and segmentator to improve the performance under the sketchy bounding-box supervisions. Specifically, we first propose an adaptive box-to-point pseudo labeler that adaptively learns to assign points located in the overlapped parts between two sketchy bounding boxes to the correct instance, resulting in compact and pure pseudo instance labels. Then, we present a coarse-to-fine instance segmentator that first predicts coarse instances from the entire point cloud and then learns fine instances based on the region of coarse instances. Finally, by using the pseudo instance labels to supervise the instance segmentator, we can gradually generate high-quality instances through joint training. Extensive experiments show that our method achieves state-of-the-art performance on both the ScanNetV2 and S3DIS benchmarks, and even outperforms several fully supervised methods using sketchy bounding boxes. Code is available at https://github.com/dengq7/Sketchy-3DIS.

Paper Structure

This paper contains 15 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: (a) illustrates the inputs of bounding box supervised 3D instance segmentation. (b) compares the performances of GaPro ngo2023gapro and our Sketchy-3DIS under both accurate and scaled sketchy boxes supervision on ScanNetV2 validation set.
  • Figure 2: Various sketchy bounding boxes under scaling, translation, and rotation perturbations.
  • Figure 3: The framework of the Sketchy-3DIS. Given a point cloud with sketchy bounding-box annotations, we first extract the backbone features using a 3D U-Net backbone, then feed them into the adaptive box-to-point pseudo labeler and the coarse-to-fine instance segmentator, finally, we utilize the generated high quality pseudo labels to supervise the predicted instances periodically.
  • Figure 4: The details of Multi-level Attention Block. The instance queries interact with the features of the whole scene, the coarse instance regions, and the instance core regions hierarchically. And the $B_{mask}$ are the boxes obtained from the predicted masks.
  • Figure 5: Visualization comparison of pseudo labels on the ScanNetV2 training set. The black denotes the background points and other colors denote different objects. The green and red cycles highlight the key regions.
  • ...and 1 more figures