Table of Contents
Fetching ...

Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation

Yujun Chen, Xin Tan, Zhizhong Zhang, Yanyun Qu, Yuan Xie

TL;DR

The paper tackles the challenge of semi-supervised multi-modal point cloud panoptic segmentation by proposing latent-label strategies that go beyond the displayed annotations. It introduces Cylinder-Mix to generate reliable, diverse LiDAR samples and the Instance Position-Scale Learning (IPSL) module to inject instance position and scale cues derived from 3D-2D projections into the image branch. These latent labels are fused in a three-component framework (LiDAR Branch, Image Branch, and Multi-modal Segmentation Network) and reinforced by self-training, achieving state-of-the-art performance on SemanticKITTI and nuScenes compared to LaserMix. The approach demonstrates robustness across detectors and vision models, enabling significant gains at low labeling ratios and offering a practical, adaptable path for leveraging unlabeled data in real-world autonomous systems.

Abstract

As the exorbitant expense of labeling autopilot datasets and the growing trend of utilizing unlabeled data, semi-supervised segmentation on point clouds becomes increasingly imperative. Intuitively, finding out more ``unspoken words'' (i.e., latent instance information) beyond the label itself should be helpful to improve performance. In this paper, we discover two types of latent labels behind the displayed label embedded in LiDAR and image data. First, in the LiDAR Branch, we propose a novel augmentation, Cylinder-Mix, which is able to augment more yet reliable samples for training. Second, in the Image Branch, we propose the Instance Position-scale Learning (IPSL) Module to learn and fuse the information of instance position and scale, which is from a 2D pre-trained detector and a type of latent label obtained from 3D to 2D projection. Finally, the two latent labels are embedded into the multi-modal panoptic segmentation network. The ablation of the IPSL module demonstrates its robust adaptability, and the experiments evaluated on SemanticKITTI and nuScenes demonstrate that our model outperforms the state-of-the-art method, LaserMix.

Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation

TL;DR

The paper tackles the challenge of semi-supervised multi-modal point cloud panoptic segmentation by proposing latent-label strategies that go beyond the displayed annotations. It introduces Cylinder-Mix to generate reliable, diverse LiDAR samples and the Instance Position-Scale Learning (IPSL) module to inject instance position and scale cues derived from 3D-2D projections into the image branch. These latent labels are fused in a three-component framework (LiDAR Branch, Image Branch, and Multi-modal Segmentation Network) and reinforced by self-training, achieving state-of-the-art performance on SemanticKITTI and nuScenes compared to LaserMix. The approach demonstrates robustness across detectors and vision models, enabling significant gains at low labeling ratios and offering a practical, adaptable path for leveraging unlabeled data in real-world autonomous systems.

Abstract

As the exorbitant expense of labeling autopilot datasets and the growing trend of utilizing unlabeled data, semi-supervised segmentation on point clouds becomes increasingly imperative. Intuitively, finding out more ``unspoken words'' (i.e., latent instance information) beyond the label itself should be helpful to improve performance. In this paper, we discover two types of latent labels behind the displayed label embedded in LiDAR and image data. First, in the LiDAR Branch, we propose a novel augmentation, Cylinder-Mix, which is able to augment more yet reliable samples for training. Second, in the Image Branch, we propose the Instance Position-scale Learning (IPSL) Module to learn and fuse the information of instance position and scale, which is from a 2D pre-trained detector and a type of latent label obtained from 3D to 2D projection. Finally, the two latent labels are embedded into the multi-modal panoptic segmentation network. The ablation of the IPSL module demonstrates its robust adaptability, and the experiments evaluated on SemanticKITTI and nuScenes demonstrate that our model outperforms the state-of-the-art method, LaserMix.
Paper Structure (38 sections, 12 equations, 8 figures, 11 tables)

This paper contains 38 sections, 12 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Segmentation quality at various Semi-supervised ratios. Our model outperforms all other methods in mIoU.
  • Figure 1: Ground truth and high-quality GSA mask.
  • Figure 2: The framework of our model. Our model is composed of three parts. LiDAR Branch, on the 3D point cloud branch, gets better 3D features through self-supervised augmentation, called Cylinder-Mix, while Image Branch improves the 2D backbone via fusion of instance position and scale information. After that, both cross-modal features will be fused to the BEV feature, following Multi-modal Segmentation Network to extract features and get point-wise labels in the end.
  • Figure 2: Visualization on SemanticKITTI
  • Figure 3: Sketch map of Cylinder-Mix.
  • ...and 3 more figures