Table of Contents
Fetching ...

Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

Haoran Gong, Haodong Wang, Di Wang

TL;DR

This work tackles semantic segmentation of large-scale outdoor point clouds, where object diversity and severe class imbalance challenge learning. It introduces MCNet, a network that couples a Multilateral Cascading Attention Enhancement (MCAE) encoder with a Point Cross Stage Partial (P-CSP) decoder, augmented by semantic-weighted sampling and a neighborhood voting post-processing step. Across Toronto3D and SensatUrban benchmarks, MCNet delivers state-of-the-art performance, with notable gains on underrepresented small-object categories and a reported $2.1\%$ improvement in overall $mIoU$ on SensatUrban. By decoupling coordinate and feature processing, and employing multiscale fusion and local-context aggregation, the approach offers robust, scalable segmentation for large outdoor scenes, with practical implications for environment perception and navigation.

Abstract

Semantic segmentation of large-scale outdoor point clouds is of significant importance in environment perception and scene understanding. However, this task continues to present a significant research challenge, due to the inherent complexity of outdoor objects and their diverse distributions in real-world environments. In this study, we propose the Multilateral Cascading Network (MCNet) designed to address this challenge. The model comprises two key components: a Multilateral Cascading Attention Enhancement (MCAE) module, which facilitates the learning of complex local features through multilateral cascading operations; and a Point Cross Stage Partial (P-CSP) module, which fuses global and local features, thereby optimizing the integration of valuable feature information across multiple scales. Our proposed method demonstrates superior performance relative to state-of-the-art approaches across two widely recognized benchmark datasets: Toronto3D and SensatUrban. Especially on the city-scale SensatUrban dataset, our results surpassed the current best result by 2.1\% in overall mIoU and yielded an improvement of 15.9\% on average for small-sample object categories comprising less than 2\% of the total samples, in comparison to the baseline method.

Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

TL;DR

This work tackles semantic segmentation of large-scale outdoor point clouds, where object diversity and severe class imbalance challenge learning. It introduces MCNet, a network that couples a Multilateral Cascading Attention Enhancement (MCAE) encoder with a Point Cross Stage Partial (P-CSP) decoder, augmented by semantic-weighted sampling and a neighborhood voting post-processing step. Across Toronto3D and SensatUrban benchmarks, MCNet delivers state-of-the-art performance, with notable gains on underrepresented small-object categories and a reported improvement in overall on SensatUrban. By decoupling coordinate and feature processing, and employing multiscale fusion and local-context aggregation, the approach offers robust, scalable segmentation for large outdoor scenes, with practical implications for environment perception and navigation.

Abstract

Semantic segmentation of large-scale outdoor point clouds is of significant importance in environment perception and scene understanding. However, this task continues to present a significant research challenge, due to the inherent complexity of outdoor objects and their diverse distributions in real-world environments. In this study, we propose the Multilateral Cascading Network (MCNet) designed to address this challenge. The model comprises two key components: a Multilateral Cascading Attention Enhancement (MCAE) module, which facilitates the learning of complex local features through multilateral cascading operations; and a Point Cross Stage Partial (P-CSP) module, which fuses global and local features, thereby optimizing the integration of valuable feature information across multiple scales. Our proposed method demonstrates superior performance relative to state-of-the-art approaches across two widely recognized benchmark datasets: Toronto3D and SensatUrban. Especially on the city-scale SensatUrban dataset, our results surpassed the current best result by 2.1\% in overall mIoU and yielded an improvement of 15.9\% on average for small-sample object categories comprising less than 2\% of the total samples, in comparison to the baseline method.
Paper Structure (19 sections, 4 equations, 4 figures, 5 tables)

This paper contains 19 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overall Architecture. "CBL" means features going through 1$\times$1 convolution, batch normalization and Leaky ReLU.
  • Figure 2: Label distributions of Toronto3D and SensatUrban dataset.
  • Figure 3: Visual comparison of segmentation results between MCNet and the baseline RandLA-Net on the Toronto3D dataset.
  • Figure 4: Visual comparison of segmentation results between MCNet and the baseline RandLA-Net on the SensatUrban dataset.