Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization
Mengtian Li, Shaohui Lin, Zihan Wang, Yunhang Shen, Baochang Zhang, Lizhuang Ma
TL;DR
The paper tackles the problem of class-imbalanced semi-supervised semantic segmentation in large-scale 3D point clouds by introducing a decoupled optimization framework that separately learns backbone representations and the classifier. It employs a two-round pseudo-label generation strategy with moving thresholds and class-aware sampling, combined with a multi-class imbalanced focal loss to rebalance feature learning toward head-to-tail classes, followed by a focused classifier fine-tuning step. Key contributions include the decoupled optimization paradigm, the two-round pseudo-label mechanism, and the $\mathcal{L}_{m\mathcal{I}-\mathrm{FL}}$ loss, validated across indoor and outdoor datasets (S3DIS, ScanNet-V2, Semantic3D, SemanticKITTI) where it achieves state-of-the-art results under $1\%$ and $1$pt labeling, sometimes surpassing fully supervised baselines. This approach mitigates tail-class bias while preserving feature generalization, making it practically impactful for scalable 3D scene understanding with limited annotations.
Abstract
Semi-supervised learning (SSL), thanks to the significant reduction of data annotation costs, has been an active research topic for large-scale 3D scene understanding. However, the existing SSL-based methods suffer from severe training bias, mainly due to class imbalance and long-tail distributions of the point cloud data. As a result, they lead to a biased prediction for the tail class segmentation. In this paper, we introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively. In particular, we first employ two-round pseudo-label generation to select unlabeled points across head-to-tail classes. We further introduce multi-class imbalanced focus loss to adaptively pay more attention to feature learning across head-to-tail classes. We fix the backbone parameters after feature learning and retrain the classifier using ground-truth points to update its parameters. Extensive experiments demonstrate the effectiveness of our method outperforming previous state-of-the-art methods on both indoor and outdoor 3D point cloud datasets (i.e., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI) using 1% and 1pt evaluation.
