Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment
Liyao Tang, Zhe Chen, Shanshan Zhao, Chaoyue Wang, Dacheng Tao
TL;DR
This work tackles label-efficient semantic segmentation in both 2D and 3D by addressing noisy pseudo-labels and distributional gaps between labels and model predictions. It introduces ERDA, a simple yet effective framework combining Entropy Regularization and Distribution Alignment, which collapses to a cross-entropy-like objective that jointly tunes pseudo-labels and the segmentation model. To extend applicability across modalities, the authors add modality-agnostic pseudo-labeling strategies, including prototypical labeling and query-based labeling with cross-attention to cope with diverse augmentations. Extensive experiments demonstrate that ERDA outperforms state-of-the-art label-efficient methods across 3D point clouds (S3DIS, ScanNet, SensatUrban) and 2D images (Pascal, Cityscapes), even surpassing some fully supervised baselines with as little as 1% labeled data, and generalizes to medical and unsupervised segmentation; limitations include the assumption of complete semantic coverage and potential integration with large foundation models for open-world tasks.
Abstract
Label-efficient segmentation aims to perform effective segmentation on input data using only sparse and limited ground-truth labels for training. This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely, while it is also essential for cost-effective segmentation on 2D images. Until recently, pseudo-labels have been widely employed to facilitate training with limited ground-truth labels, and promising progress has been witnessed in both the 2D and 3D segmentation. However, existing pseudo-labeling approaches could suffer heavily from the noises and variations in unlabelled data, which would result in significant discrepancies between generated pseudo-labels and current model predictions during training. We analyze that this can further confuse and affect the model learning process, which shows to be a shared problem in label-efficient learning across both 2D and 3D modalities. To address this issue, we propose a novel learning strategy to regularize the pseudo-labels generated for training, thus effectively narrowing the gaps between pseudo-labels and model predictions. More specifically, our method introduces an Entropy Regularization loss and a Distribution Alignment loss for label-efficient learning, resulting in an ERDA learning strategy. Interestingly, by using KL distance to formulate the distribution alignment loss, ERDA reduces to a deceptively simple cross-entropy-based loss which optimizes both the pseudo-label generation module and the segmentation model simultaneously. In addition, we innovate in the pseudo-label generation to make our ERDA consistently effective across both 2D and 3D data modalities for segmentation. Enjoying simplicity and more modality-agnostic pseudo-label generation, our method has shown outstanding performance in fully utilizing all unlabeled data points for training across ...
