Table of Contents
Fetching ...

Learning from Spatio-temporal Correlation for Semi-Supervised LiDAR Semantic Segmentation

Seungho Lee, Hwijeong Lee, Hyunjung Shim

TL;DR

The paper addresses semi-supervised LiDAR segmentation under low-label regimes, where pseudo-label quality and labeled/unlabeled imbalance impair learning. It introduces Proximity-based Label Estimation (PLE) to exploit spatio-temporal overlap between LiDAR scans and a progressive extension to handle dynamic objects, together with a dual-branch MeanTeacher architecture that separates clean and noisy supervision to break the vicious cycle. Empirically, PLE and the dual-branch design achieve state-of-the-art results on SemanticKITTI and nuScenes, even with only $5\%$ labeled data, and can surpass fully supervised performance with as few as $20\%$ labels on nuScenes, demonstrating strong practicality and robustness across representations and backbones. The approach offers scalable, data-efficient LiDAR perception suitable for real-world autonomous driving scenarios.

Abstract

We address the challenges of the semi-supervised LiDAR segmentation (SSLS) problem, particularly in low-budget scenarios. The two main issues in low-budget SSLS are the poor-quality pseudo-labels for unlabeled data, and the performance drops due to the significant imbalance between ground-truth and pseudo-labels. This imbalance leads to a vicious training cycle. To overcome these challenges, we leverage the spatio-temporal prior by recognizing the substantial overlap between temporally adjacent LiDAR scans. We propose a proximity-based label estimation, which generates highly accurate pseudo-labels for unlabeled data by utilizing semantic consistency with adjacent labeled data. Additionally, we enhance this method by progressively expanding the pseudo-labels from the nearest unlabeled scans, which helps significantly reduce errors linked to dynamic classes. Additionally, we employ a dual-branch structure to mitigate performance degradation caused by data imbalance. Experimental results demonstrate remarkable performance in low-budget settings (i.e., <= 5%) and meaningful improvements in normal budget settings (i.e., 5 - 50%). Finally, our method has achieved new state-of-the-art results on SemanticKITTI and nuScenes in semi-supervised LiDAR segmentation. With only 5% labeled data, it offers competitive results against fully-supervised counterparts. Moreover, it surpasses the performance of the previous state-of-the-art at 100% labeled data (75.2%) using only 20% of labeled data (76.0%) on nuScenes. The code is available on https://github.com/halbielee/PLE.

Learning from Spatio-temporal Correlation for Semi-Supervised LiDAR Semantic Segmentation

TL;DR

The paper addresses semi-supervised LiDAR segmentation under low-label regimes, where pseudo-label quality and labeled/unlabeled imbalance impair learning. It introduces Proximity-based Label Estimation (PLE) to exploit spatio-temporal overlap between LiDAR scans and a progressive extension to handle dynamic objects, together with a dual-branch MeanTeacher architecture that separates clean and noisy supervision to break the vicious cycle. Empirically, PLE and the dual-branch design achieve state-of-the-art results on SemanticKITTI and nuScenes, even with only labeled data, and can surpass fully supervised performance with as few as labels on nuScenes, demonstrating strong practicality and robustness across representations and backbones. The approach offers scalable, data-efficient LiDAR perception suitable for real-world autonomous driving scenarios.

Abstract

We address the challenges of the semi-supervised LiDAR segmentation (SSLS) problem, particularly in low-budget scenarios. The two main issues in low-budget SSLS are the poor-quality pseudo-labels for unlabeled data, and the performance drops due to the significant imbalance between ground-truth and pseudo-labels. This imbalance leads to a vicious training cycle. To overcome these challenges, we leverage the spatio-temporal prior by recognizing the substantial overlap between temporally adjacent LiDAR scans. We propose a proximity-based label estimation, which generates highly accurate pseudo-labels for unlabeled data by utilizing semantic consistency with adjacent labeled data. Additionally, we enhance this method by progressively expanding the pseudo-labels from the nearest unlabeled scans, which helps significantly reduce errors linked to dynamic classes. Additionally, we employ a dual-branch structure to mitigate performance degradation caused by data imbalance. Experimental results demonstrate remarkable performance in low-budget settings (i.e., <= 5%) and meaningful improvements in normal budget settings (i.e., 5 - 50%). Finally, our method has achieved new state-of-the-art results on SemanticKITTI and nuScenes in semi-supervised LiDAR segmentation. With only 5% labeled data, it offers competitive results against fully-supervised counterparts. Moreover, it surpasses the performance of the previous state-of-the-art at 100% labeled data (75.2%) using only 20% of labeled data (76.0%) on nuScenes. The code is available on https://github.com/halbielee/PLE.

Paper Structure

This paper contains 15 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Segmentation performances (mIoU) across various labeled ratios. We outperform the state-of-the-art method, LaserMix kong2023lasermix, across all labeled ratios in nuScenes caesar2020nuscenes, with particularly large margins in low-budget settings. Notably, our method with 5$\times$ fewer labels already achieves the performance of full supervision.
  • Figure 2: Overall framework. We generate pseudo-labels for unlabeled scans by leveraging spatio-temporal prior in LiDAR. Unlabeled scans with PLE labels are treated as a labeled set during training. We adopt the MeanTeacher model, where the teacher network generates pseudo-labels for remaining unlabeled data. To mitigate the performance degradation caused by the noisy pseudo-labels, we introduce a dual-branch structure where the labeled and unlabeled data are processed separately.
  • Figure 3: Qualitative comparisons between our method and LaserMix. All samples are visualized from LiDAR bird's-eye view on val set of SemanticKITTI behley2019semantickitti. The correct and incorrect are painted in gray and red to highlight the difference. Dashed circles highlight the misprediction of LaserMix. Best viewed in color.
  • Figure 4: Ablation studies. (a) Accuracy of pseudo-labels from the Teacher network and PLE. (b) Accuracy of PLE-labels over time intervals. (c) Accuracy of pseudo-labels during training. (d) Training results according to confidence threshold. All results are from a 1% ratio of the SemanticKITTI.