Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation

Chuandong Liu; Xingxing Weng; Shuguo Jiang; Pengcheng Li; Lei Yu; Gui-Song Xia

Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation

Chuandong Liu, Xingxing Weng, Shuguo Jiang, Pengcheng Li, Lei Yu, Gui-Song Xia

TL;DR

This work tackles semi-supervised LiDAR semantic segmentation for driving scenes by addressing intra- and inter-scene affinity. It introduces AIScene, which uses a point erasure strategy to enforce intra-scene consistency and a patch-based augmentation pipeline (MixPatch and InsFill) to exploit inter-scene correlations across multiple scenes. The method operates within a teacher–student framework with EMA updates and a pseudo-label threshold, achieving state-of-the-art gains on SemanticKITTI and nuScenes, particularly at very low labeled data (1%). The results illustrate that both the erasure mechanism and the multi-scene augmentation independently and jointly improve learning from unlabeled data, offering practical improvements for reducing labeling demands in autonomous-driving perception systems.

Abstract

This paper explores scene affinity (AIScene), namely intra-scene consistency and inter-scene correlation, for semi-supervised LiDAR semantic segmentation in driving scenes. Adopting teacher-student training, AIScene employs a teacher network to generate pseudo-labeled scenes from unlabeled data, which then supervise the student network's learning. Unlike most methods that include all points in pseudo-labeled scenes for forward propagation but only pseudo-labeled points for backpropagation, AIScene removes points without pseudo-labels, ensuring consistency in both forward and backward propagation within the scene. This simple point erasure strategy effectively prevents unsupervised, semantically ambiguous points (excluded in backpropagation) from affecting the learning of pseudo-labeled points. Moreover, AIScene incorporates patch-based data augmentation, mixing multiple scenes at both scene and instance levels. Compared to existing augmentation techniques that typically perform scene-level mixing between two scenes, our method enhances the semantic diversity of labeled (or pseudo-labeled) scenes, thereby improving the semi-supervised performance of segmentation models. Experiments show that AIScene outperforms previous methods on two popular benchmarks across four settings, achieving notable improvements of 1.9% and 2.1% in the most challenging 1% labeled data.

Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation

TL;DR

Abstract

Paper Structure (13 sections, 8 equations, 7 figures, 4 tables)

This paper contains 13 sections, 8 equations, 7 figures, 4 tables.

Introduction
Related Work
Method
Preliminary
Point Erasure Strategy
Patch-based Data Augmentation
Model Training
Experiments
Datasets and Evaluation Metrics
Implementation Details
Main Results
Ablation Study
Conclusion

Figures (7)

Figure 1: Top: Pseudo-labeled scenes generated from unlabeled scenes typically contain points with pseudo labels (colored) and without them (gray). Previous methods include all points in forward propagation but only pseudo-labeled points in backward propagation, leading to intra-scene inconsistency. This inconsistency may allow unsupervised, semantically ambiguous points to affect the learning of pseudo-labeled points (see \ref{['fig:probability']}). Bottom: Existing data augmentation techniques suitable for semi-supervised segmentation usually mix two scenes through scene-level operations like concatenation or swapping. Their resulting scenes may be constrained in terms of semantic diversity, potentially lacking objects like bicyclists and buses, which are important in driving scenes.
Figure 2: Overview of the proposed AIScene training pipeline. The teacher network generates the pseudo-labels $y_i^u$ from the unlabeled scene $x_i^u$, followed by an erasure operation to filter out low-confidence points, ensuring training consistency. For the labeled scene $x_i^l$ and the generated erased scene $\hat{x}_i^{u}$, we employ patch-based data augmentation to obtain augmented scenes $x_i^{lpf}$ and $\hat{x}_i^{upf}$, respectively. Subsequently, both scenes are fed into the student network to compute the loss of $\mathcal{L}_l$ and $\mathcal{L}_u$.
Figure 3: Illustration of the pool initialization process. For each labeled point cloud scene, splitting it into multiple patches from the BEV perspective and assigning different instance categories to each patch enables the extraction of instances belonging to different classes. Subsequently, these patches and instances are stored individually in the patch pool and instance pool.
Figure 4: (a) Illustration of the MixPatch process. (b) Illustration of selecting instances from the patch pool for filling. An instance will not be filled if there are no other points around the filling location or if the filled instance overlaps with existing instances.
Figure 5: Qualitative comparison results from LiDAR bird's eye view on SemanticKITTI val set. We set the point with correct and incorrect predicted semantic class in blue and orange for better visualization. Best viewed in color.
...and 2 more figures

Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation

TL;DR

Abstract

Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)