Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision

Shuyu Cao; Chongshou Li; Jie Xu; Tianrui Li; Na Zhao

Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision

Shuyu Cao, Chongshou Li, Jie Xu, Tianrui Li, Na Zhao

TL;DR

This work addresses 3D hierarchical semantic segmentation (3DHS) by tackling two persistent challenges: cross-hierarchy conflicts when using a shared parameter backbone and inherent class imbalance across hierarchy levels. It proposes a Late-decoupled 3DHS (Ld-3DHS) framework that uses separate decoders per hierarchy fed by a shared encoder, augmented with a coarse-to-fine guidance mechanism and a cross-hierarchical consistency loss. An auxiliary discrimination branch learns class-wise discriminative features via supervised contrastive learning and mutual semantic-prototype supervision, forming a total objective $\mathcal{L}_{total}=\mathcal{L}_{late-3DHS}+\lambda\sum_{h=1}^H\mathcal{L}_{aux}^{(h)}$ that improves minority-class segmentation. Experiments on Campus3D, S3DIS-H, and SensatUrban-H demonstrate state-of-the-art performance across backbones and datasets, and the approach provides a plug-and-play enhancement to existing 3DHS methods with broader practical impact for embodied intelligence tasks.

Abstract

3D hierarchical semantic segmentation (3DHS) is crucial for embodied intelligence applications that demand a multi-grained and multi-hierarchy understanding of 3D scenes. Despite the progress, previous 3DHS methods have overlooked following two challenges: I) multi-label learning with a parameter-sharing model can lead to multi-hierarchy conflicts in cross-hierarchy optimization, and II) the class imbalance issue is inevitable across multiple hierarchies of 3D scenes, which makes the model performance become dominated by major classes. To address these issues, we propose a novel framework with a primary 3DHS branch and an auxiliary discrimination branch. Specifically, to alleviate the multi-hierarchy conflicts, we propose a late-decoupled 3DHS framework which employs multiple decoders with the coarse-to-fine hierarchical guidance and consistency. The late-decoupled architecture can mitigate the underfitting and overfitting conflicts among multiple hierarchies and can also constrain the class imbalance problem in each individual hierarchy. Moreover, we introduce a 3DHS-oriented semantic prototype based bi-branch supervision mechanism, which additionally learns class-wise discriminative point cloud features and performs mutual supervision between the auxiliary and 3DHS branches, to enhance the class-imbalance segmentation. Extensive experiments on multiple datasets and backbones demonstrate that our approach achieves state-of-the-art 3DHS performance, and its core components can also be used as a plug-and-play enhancement to improve previous methods.

Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision

TL;DR

Abstract

Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)