Table of Contents
Fetching ...

Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization

Mengtian Li, Shaohui Lin, Zihan Wang, Yunhang Shen, Baochang Zhang, Lizhuang Ma

TL;DR

The paper tackles the problem of class-imbalanced semi-supervised semantic segmentation in large-scale 3D point clouds by introducing a decoupled optimization framework that separately learns backbone representations and the classifier. It employs a two-round pseudo-label generation strategy with moving thresholds and class-aware sampling, combined with a multi-class imbalanced focal loss to rebalance feature learning toward head-to-tail classes, followed by a focused classifier fine-tuning step. Key contributions include the decoupled optimization paradigm, the two-round pseudo-label mechanism, and the $\mathcal{L}_{m\mathcal{I}-\mathrm{FL}}$ loss, validated across indoor and outdoor datasets (S3DIS, ScanNet-V2, Semantic3D, SemanticKITTI) where it achieves state-of-the-art results under $1\%$ and $1$pt labeling, sometimes surpassing fully supervised baselines. This approach mitigates tail-class bias while preserving feature generalization, making it practically impactful for scalable 3D scene understanding with limited annotations.

Abstract

Semi-supervised learning (SSL), thanks to the significant reduction of data annotation costs, has been an active research topic for large-scale 3D scene understanding. However, the existing SSL-based methods suffer from severe training bias, mainly due to class imbalance and long-tail distributions of the point cloud data. As a result, they lead to a biased prediction for the tail class segmentation. In this paper, we introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively. In particular, we first employ two-round pseudo-label generation to select unlabeled points across head-to-tail classes. We further introduce multi-class imbalanced focus loss to adaptively pay more attention to feature learning across head-to-tail classes. We fix the backbone parameters after feature learning and retrain the classifier using ground-truth points to update its parameters. Extensive experiments demonstrate the effectiveness of our method outperforming previous state-of-the-art methods on both indoor and outdoor 3D point cloud datasets (i.e., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI) using 1% and 1pt evaluation.

Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization

TL;DR

The paper tackles the problem of class-imbalanced semi-supervised semantic segmentation in large-scale 3D point clouds by introducing a decoupled optimization framework that separately learns backbone representations and the classifier. It employs a two-round pseudo-label generation strategy with moving thresholds and class-aware sampling, combined with a multi-class imbalanced focal loss to rebalance feature learning toward head-to-tail classes, followed by a focused classifier fine-tuning step. Key contributions include the decoupled optimization paradigm, the two-round pseudo-label mechanism, and the loss, validated across indoor and outdoor datasets (S3DIS, ScanNet-V2, Semantic3D, SemanticKITTI) where it achieves state-of-the-art results under and pt labeling, sometimes surpassing fully supervised baselines. This approach mitigates tail-class bias while preserving feature generalization, making it practically impactful for scalable 3D scene understanding with limited annotations.

Abstract

Semi-supervised learning (SSL), thanks to the significant reduction of data annotation costs, has been an active research topic for large-scale 3D scene understanding. However, the existing SSL-based methods suffer from severe training bias, mainly due to class imbalance and long-tail distributions of the point cloud data. As a result, they lead to a biased prediction for the tail class segmentation. In this paper, we introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively. In particular, we first employ two-round pseudo-label generation to select unlabeled points across head-to-tail classes. We further introduce multi-class imbalanced focus loss to adaptively pay more attention to feature learning across head-to-tail classes. We fix the backbone parameters after feature learning and retrain the classifier using ground-truth points to update its parameters. Extensive experiments demonstrate the effectiveness of our method outperforming previous state-of-the-art methods on both indoor and outdoor 3D point cloud datasets (i.e., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI) using 1% and 1pt evaluation.
Paper Structure (16 sections, 9 equations, 11 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the widely used S3DIS dataset on training and test setting for class-imbalanced semi-supervised point cloud semantic segmentation. (a) The distribution of annotation data in the training set: long-tail distribution of $1\%$ and uniform distribution of 1pt. For a better view of their distributions, the number of labeled points for each is multiplied with the same number, e.g., $2,000$ in $1\%$ and $70,000$ in 1pt. (b) Long-tail distribution in the test set. (c) IoU of PSD zhang2021perturbed and ours on head {wall, cell}, waist {chair, table} and tail {board, sofa} classes.
  • Figure 2: The illustration of the decoupling optimization framework. We first pre-train the network with a small number of given labeled points. Then, we conduct alternative optimization to iteratively update the backbone's parameters in the $I-step$ and the classifier's (MLPs-c) parameters in the $II-step$. In particular, two-round pseudo label generation is introduced to sample relative rebalanced points across head-to-tail classes in the $I-step$, which can be used to form multi-class imbalanced focus loss $\mathcal{L}_{m\mathcal{I}-FL}$ for better adaptive feature learning together with ground-truth labeled points by $\mathcal{L}_{seg-I}$. After feature learning, we fine-tune the classifier using the traditional softmax cross-entropy loss $\mathcal{L}_{seg-II}$ on the labeled points.
  • Figure 3: Visualization results on the validation set of S3DIS. Raw point cloud, semantic labels, ours and results of PSD are presented separately from left to right.
  • Figure 4: Visualization results on ScanNet-V2.
  • Figure 5: Visualization results on the validation of SemanticKITTI. Semantic labels, ours and results of PSD are presented separately from left to right.
  • ...and 6 more figures