Table of Contents
Fetching ...

Bayesian Self-Training for Semi-Supervised 3D Segmentation

Ozan Unal, Christos Sakaridis, Luc Van Gool

TL;DR

This paper introduces Bayesian self-training for semi-supervised 3D perception, leveraging dropout-based Monte Carlo inference to estimate predictive uncertainty and filter pseudo-labels with entropy. It unifies semantic segmentation, instance segmentation, and dense 3D visual grounding under a common framework, employing a novel n-partite matching strategy (Hungarian) to align instance predictions across stochastic passes. The approach achieves state-of-the-art results on SemanticKITTI, ScribbleKITTI, ScanNet, S3DIS, and ScanRefer, including gains when verbal prompts are available for unlabeled data. The method is simple to implement, scalable, and broadly applicable to dense 3D tasks with partial labels, offering practical impact for real-world 3D understanding systems.

Abstract

3D segmentation is a core problem in computer vision and, similarly to many other dense prediction tasks, it requires large amounts of annotated data for adequate training. However, densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set. This area thus studies the effective use of unlabeled data to reduce the performance gap that arises due to the lack of annotations. In this work, inspired by Bayesian deep learning, we first propose a Bayesian self-training framework for semi-supervised 3D semantic segmentation. Employing stochastic inference, we generate an initial set of pseudo-labels and then filter these based on estimated point-wise uncertainty. By constructing a heuristic $n$-partite matching algorithm, we extend the method to semi-supervised 3D instance segmentation, and finally, with the same building blocks, to dense 3D visual grounding. We demonstrate state-of-the-art results for our semi-supervised method on SemanticKITTI and ScribbleKITTI for 3D semantic segmentation and on ScanNet and S3DIS for 3D instance segmentation. We further achieve substantial improvements in dense 3D visual grounding over supervised-only baselines on ScanRefer. Our project page is available at ouenal.github.io/bst/.

Bayesian Self-Training for Semi-Supervised 3D Segmentation

TL;DR

This paper introduces Bayesian self-training for semi-supervised 3D perception, leveraging dropout-based Monte Carlo inference to estimate predictive uncertainty and filter pseudo-labels with entropy. It unifies semantic segmentation, instance segmentation, and dense 3D visual grounding under a common framework, employing a novel n-partite matching strategy (Hungarian) to align instance predictions across stochastic passes. The approach achieves state-of-the-art results on SemanticKITTI, ScribbleKITTI, ScanNet, S3DIS, and ScanRefer, including gains when verbal prompts are available for unlabeled data. The method is simple to implement, scalable, and broadly applicable to dense 3D tasks with partial labels, offering practical impact for real-world 3D understanding systems.

Abstract

3D segmentation is a core problem in computer vision and, similarly to many other dense prediction tasks, it requires large amounts of annotated data for adequate training. However, densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set. This area thus studies the effective use of unlabeled data to reduce the performance gap that arises due to the lack of annotations. In this work, inspired by Bayesian deep learning, we first propose a Bayesian self-training framework for semi-supervised 3D semantic segmentation. Employing stochastic inference, we generate an initial set of pseudo-labels and then filter these based on estimated point-wise uncertainty. By constructing a heuristic -partite matching algorithm, we extend the method to semi-supervised 3D instance segmentation, and finally, with the same building blocks, to dense 3D visual grounding. We demonstrate state-of-the-art results for our semi-supervised method on SemanticKITTI and ScribbleKITTI for 3D semantic segmentation and on ScanNet and S3DIS for 3D instance segmentation. We further achieve substantial improvements in dense 3D visual grounding over supervised-only baselines on ScanRefer. Our project page is available at ouenal.github.io/bst/.
Paper Structure (18 sections, 8 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 18 sections, 8 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of our Bayesian pseudo-labeling pipeline for semi-supervised a) 3D semantic segmentation; b) 3D instance segmentation; and c) dense 3D visual grounding. With only slight adjustments, using the same building blocks, our method can be adapted to each of these tasks to achieve SOTA results.
  • Figure 2: Illustration of Bayesian pseudo-labeling for semi-supervised instance segmentation. We initialize a seed prediction via a forward pass using a non-augmented input. Then our heuristic $n$-partite matching algorithm is employed to pair each seed instance with the best matching predicted instance from each of the $K$ stochastic forward passes. For each aligned object, we compute the aggregated label through unanimous voting and filter it based on uncertainty to obtain the final pseudo mask.
  • Figure 3: Qualitative results from the SemanticKITTI ($10\%$) val-set, comparing a) the ground truth, b) Sup-only and c) ours; on ScanNet ($10\%$), comparing d) the ground-truth instance masks, e) Sup-only and f) ours; and finally g) ScanRefer ($10\%$).
  • Figure 4: Qualitative analysis of the uncertainty estimation. We compare softmax confidence to our MC-derived Shannon entropy.