Table of Contents
Fetching ...

Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

Liyao Tang, Zhe Chen, Shanshan Zhao, Chaoyue Wang, Dacheng Tao

TL;DR

This work tackles label-efficient semantic segmentation in both 2D and 3D by addressing noisy pseudo-labels and distributional gaps between labels and model predictions. It introduces ERDA, a simple yet effective framework combining Entropy Regularization and Distribution Alignment, which collapses to a cross-entropy-like objective that jointly tunes pseudo-labels and the segmentation model. To extend applicability across modalities, the authors add modality-agnostic pseudo-labeling strategies, including prototypical labeling and query-based labeling with cross-attention to cope with diverse augmentations. Extensive experiments demonstrate that ERDA outperforms state-of-the-art label-efficient methods across 3D point clouds (S3DIS, ScanNet, SensatUrban) and 2D images (Pascal, Cityscapes), even surpassing some fully supervised baselines with as little as 1% labeled data, and generalizes to medical and unsupervised segmentation; limitations include the assumption of complete semantic coverage and potential integration with large foundation models for open-world tasks.

Abstract

Label-efficient segmentation aims to perform effective segmentation on input data using only sparse and limited ground-truth labels for training. This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely, while it is also essential for cost-effective segmentation on 2D images. Until recently, pseudo-labels have been widely employed to facilitate training with limited ground-truth labels, and promising progress has been witnessed in both the 2D and 3D segmentation. However, existing pseudo-labeling approaches could suffer heavily from the noises and variations in unlabelled data, which would result in significant discrepancies between generated pseudo-labels and current model predictions during training. We analyze that this can further confuse and affect the model learning process, which shows to be a shared problem in label-efficient learning across both 2D and 3D modalities. To address this issue, we propose a novel learning strategy to regularize the pseudo-labels generated for training, thus effectively narrowing the gaps between pseudo-labels and model predictions. More specifically, our method introduces an Entropy Regularization loss and a Distribution Alignment loss for label-efficient learning, resulting in an ERDA learning strategy. Interestingly, by using KL distance to formulate the distribution alignment loss, ERDA reduces to a deceptively simple cross-entropy-based loss which optimizes both the pseudo-label generation module and the segmentation model simultaneously. In addition, we innovate in the pseudo-label generation to make our ERDA consistently effective across both 2D and 3D data modalities for segmentation. Enjoying simplicity and more modality-agnostic pseudo-label generation, our method has shown outstanding performance in fully utilizing all unlabeled data points for training across ...

Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

TL;DR

This work tackles label-efficient semantic segmentation in both 2D and 3D by addressing noisy pseudo-labels and distributional gaps between labels and model predictions. It introduces ERDA, a simple yet effective framework combining Entropy Regularization and Distribution Alignment, which collapses to a cross-entropy-like objective that jointly tunes pseudo-labels and the segmentation model. To extend applicability across modalities, the authors add modality-agnostic pseudo-labeling strategies, including prototypical labeling and query-based labeling with cross-attention to cope with diverse augmentations. Extensive experiments demonstrate that ERDA outperforms state-of-the-art label-efficient methods across 3D point clouds (S3DIS, ScanNet, SensatUrban) and 2D images (Pascal, Cityscapes), even surpassing some fully supervised baselines with as little as 1% labeled data, and generalizes to medical and unsupervised segmentation; limitations include the assumption of complete semantic coverage and potential integration with large foundation models for open-world tasks.

Abstract

Label-efficient segmentation aims to perform effective segmentation on input data using only sparse and limited ground-truth labels for training. This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely, while it is also essential for cost-effective segmentation on 2D images. Until recently, pseudo-labels have been widely employed to facilitate training with limited ground-truth labels, and promising progress has been witnessed in both the 2D and 3D segmentation. However, existing pseudo-labeling approaches could suffer heavily from the noises and variations in unlabelled data, which would result in significant discrepancies between generated pseudo-labels and current model predictions during training. We analyze that this can further confuse and affect the model learning process, which shows to be a shared problem in label-efficient learning across both 2D and 3D modalities. To address this issue, we propose a novel learning strategy to regularize the pseudo-labels generated for training, thus effectively narrowing the gaps between pseudo-labels and model predictions. More specifically, our method introduces an Entropy Regularization loss and a Distribution Alignment loss for label-efficient learning, resulting in an ERDA learning strategy. Interestingly, by using KL distance to formulate the distribution alignment loss, ERDA reduces to a deceptively simple cross-entropy-based loss which optimizes both the pseudo-label generation module and the segmentation model simultaneously. In addition, we innovate in the pseudo-label generation to make our ERDA consistently effective across both 2D and 3D data modalities for segmentation. Enjoying simplicity and more modality-agnostic pseudo-label generation, our method has shown outstanding performance in fully utilizing all unlabeled data points for training across ...
Paper Structure (54 sections, 9 equations, 11 figures, 20 tables)

This paper contains 54 sections, 9 equations, 11 figures, 20 tables.

Figures (11)

  • Figure 1: While existing pseudo-labels (a) are limited in the exploitation of unlabeled points, ERDA (b) simultaneously optimizes the pseudo-labels $\mathbf p$ and predictions $\mathbf q$ taking the same and simple form of cross-entropy. By reducing the noise via entropy regularization and bridging their distributional discrepancies, ERDA produces informative pseudo-labels that neglect the need for label selection. As the exemplar in (c) on 3D data, it thus enables the model to consistently benefit from more pseudo-labels, surpassing other methods and its fully-supervised baseline.
  • Figure 2: Detailed illustration of our ERDA with the prototypical pseudo-label generation process, which is prevalently used for 3D point cloud.
  • Figure 3: Illustration of our ERDA with our query-based pseudo-label generation process under the weak-to-strong framework, which are widely adopted in 2D label-efficient segmentation. The teacher model could be either shared with the student weak_2d_fixmatchweak_2d_unimatch or an EMA-updated version of it weak_2d_recodino.
  • Figure 4: We show obvious improvement of our ERDA over baseline (RandLA-Net) on different scenes from S3DIS Area 5. In the office and hallway (top 2), ERDA produces more detailed and complete segmentation for windows and doors, and avoids over-expansion of the board and bookcase on the wall, thanks to the informative pseudo-labels. In more cluttered scenes (bottom 2), ERDA tends to make cleaner predictions by avoiding improper situations such as desk inside clutter and preserving important semantic classes such as columns.
  • Figure 5: We show a clear benefit of our ERDA with query-based pseudo-labeling over baseline (FixMatch) on Pascal validation. Similar to 3D cases, ERDA provides cleaner predictions with better separations between different semantic groups, in both outdoor and indoor scenes.
  • ...and 6 more figures