Table of Contents
Fetching ...

BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis

Weiguang Zhao, Rui Zhang, Qiufeng Wang, Guangliang Cheng, Kaizhu Huang

TL;DR

BFANet revisits 3D semantic segmentation by explicitly analyzing four error types and introducing a boundary-aware architecture. It decouples semantic and boundary features via a boundary-semantic block and fuses their queues through attention, while a real-time PBPLC computes boundary pseudo-labels during training and inference. The method achieves state-of-the-art results on ScanNet200 and competitive performance on ScanNetv2, aided by a fast boundary labeling component that runs in about $46.3$ ms per scene. This boundary-centric approach highlights the importance of edge-aware representations for challenging regions and provides new metrics to guide future 3D segmentation research.

Abstract

3D semantic segmentation plays a fundamental and crucial role to understand 3D scenes. While contemporary state-of-the-art techniques predominantly concentrate on elevating the overall performance of 3D semantic segmentation based on general metrics (e.g. mIoU, mAcc, and oAcc), they unfortunately leave the exploration of challenging regions for segmentation mostly neglected. In this paper, we revisit 3D semantic segmentation through a more granular lens, shedding light on subtle complexities that are typically overshadowed by broader performance metrics. Concretely, we have delineated 3D semantic segmentation errors into four comprehensive categories as well as corresponding evaluation metrics tailored to each. Building upon this categorical framework, we introduce an innovative 3D semantic segmentation network called BFANet that incorporates detailed analysis of semantic boundary features. First, we design the boundary-semantic module to decouple point cloud features into semantic and boundary features, and fuse their query queue to enhance semantic features with attention. Second, we introduce a more concise and accelerated boundary pseudo-label calculation algorithm, which is 3.9 times faster than the state-of-the-art, offering compatibility with data augmentation and enabling efficient computation in training. Extensive experiments on benchmark data indicate the superiority of our BFANet model, confirming the significance of emphasizing the four uniquely designed metrics. Code is available at https://github.com/weiguangzhao/BFANet.

BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis

TL;DR

BFANet revisits 3D semantic segmentation by explicitly analyzing four error types and introducing a boundary-aware architecture. It decouples semantic and boundary features via a boundary-semantic block and fuses their queues through attention, while a real-time PBPLC computes boundary pseudo-labels during training and inference. The method achieves state-of-the-art results on ScanNet200 and competitive performance on ScanNetv2, aided by a fast boundary labeling component that runs in about ms per scene. This boundary-centric approach highlights the importance of edge-aware representations for challenging regions and provides new metrics to guide future 3D segmentation research.

Abstract

3D semantic segmentation plays a fundamental and crucial role to understand 3D scenes. While contemporary state-of-the-art techniques predominantly concentrate on elevating the overall performance of 3D semantic segmentation based on general metrics (e.g. mIoU, mAcc, and oAcc), they unfortunately leave the exploration of challenging regions for segmentation mostly neglected. In this paper, we revisit 3D semantic segmentation through a more granular lens, shedding light on subtle complexities that are typically overshadowed by broader performance metrics. Concretely, we have delineated 3D semantic segmentation errors into four comprehensive categories as well as corresponding evaluation metrics tailored to each. Building upon this categorical framework, we introduce an innovative 3D semantic segmentation network called BFANet that incorporates detailed analysis of semantic boundary features. First, we design the boundary-semantic module to decouple point cloud features into semantic and boundary features, and fuse their query queue to enhance semantic features with attention. Second, we introduce a more concise and accelerated boundary pseudo-label calculation algorithm, which is 3.9 times faster than the state-of-the-art, offering compatibility with data augmentation and enabling efficient computation in training. Extensive experiments on benchmark data indicate the superiority of our BFANet model, confirming the significance of emphasizing the four uniquely designed metrics. Code is available at https://github.com/weiguangzhao/BFANet.

Paper Structure

This paper contains 20 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Four types of 3D semantic segmentation errors and proposed metrics. We revisit 3D point cloud semantic segmentation and categorize four types of semantic segmentation errors, and define the corresponding evaluation metric. See more details in Section \ref{['sec:specify']}. The visualized results are based on OctFormer octformer on the ScanNet200 dataset.
  • Figure 2: Visualization Definitions. The purple and red lines are the contour lines of P and G. The number of points within the shaded region represents the value of the corresponding variable.
  • Figure 3: Network Architecture. To maintain clarity and conciseness, we use several abbreviations within the figure. Specifically, seg. means segmentation and concat indicates concatenation. TTA denotes Test Time Augmentation. PBPLC stands for the proposed parallel boundary pseudo-label calculation. Additionally, at the bottom of the overall network architecture, we provide a visualization of the Octree construction and the real-time boundary pseudo-label calculation process.
  • Figure 4: Comparison to PTv3 with the Proposed Metric
  • Figure 5: Qualitative Comparison. Pred. stands for Prediction. The red rectangular boxes indicate areas of particular interest.