Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

Haoran Gong; Haodong Wang; Di Wang

Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

Haoran Gong, Haodong Wang, Di Wang

TL;DR

This work tackles semantic segmentation of large-scale outdoor point clouds, where object diversity and severe class imbalance challenge learning. It introduces MCNet, a network that couples a Multilateral Cascading Attention Enhancement (MCAE) encoder with a Point Cross Stage Partial (P-CSP) decoder, augmented by semantic-weighted sampling and a neighborhood voting post-processing step. Across Toronto3D and SensatUrban benchmarks, MCNet delivers state-of-the-art performance, with notable gains on underrepresented small-object categories and a reported $2.1\%$ improvement in overall $mIoU$ on SensatUrban. By decoupling coordinate and feature processing, and employing multiscale fusion and local-context aggregation, the approach offers robust, scalable segmentation for large outdoor scenes, with practical implications for environment perception and navigation.

Abstract

Semantic segmentation of large-scale outdoor point clouds is of significant importance in environment perception and scene understanding. However, this task continues to present a significant research challenge, due to the inherent complexity of outdoor objects and their diverse distributions in real-world environments. In this study, we propose the Multilateral Cascading Network (MCNet) designed to address this challenge. The model comprises two key components: a Multilateral Cascading Attention Enhancement (MCAE) module, which facilitates the learning of complex local features through multilateral cascading operations; and a Point Cross Stage Partial (P-CSP) module, which fuses global and local features, thereby optimizing the integration of valuable feature information across multiple scales. Our proposed method demonstrates superior performance relative to state-of-the-art approaches across two widely recognized benchmark datasets: Toronto3D and SensatUrban. Especially on the city-scale SensatUrban dataset, our results surpassed the current best result by 2.1\% in overall mIoU and yielded an improvement of 15.9\% on average for small-sample object categories comprising less than 2\% of the total samples, in comparison to the baseline method.

Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

TL;DR

improvement in overall

on SensatUrban. By decoupling coordinate and feature processing, and employing multiscale fusion and local-context aggregation, the approach offers robust, scalable segmentation for large outdoor scenes, with practical implications for environment perception and navigation.

Abstract

Paper Structure (19 sections, 4 equations, 4 figures, 5 tables)

This paper contains 19 sections, 4 equations, 4 figures, 5 tables.

Introduction
Methodology
Overall Network Architecture
Semantic-weighted Sampling Module
Encoding Layer
Decoding Layer
Neighborhood Voting Module
MCAE Module
P-CSP Module
Experiments
Experiment on the Toronto3D Dataset
Experiment on the SensatUrban Dataset
Ablation Study
Effectiveness of semantic-based weighted point sampling
Effectiveness of MCAE
...and 4 more sections

Figures (4)

Figure 1: Overall Architecture. "CBL" means features going through 1$\times$1 convolution, batch normalization and Leaky ReLU.
Figure 2: Label distributions of Toronto3D and SensatUrban dataset.
Figure 3: Visual comparison of segmentation results between MCNet and the baseline RandLA-Net on the Toronto3D dataset.
Figure 4: Visual comparison of segmentation results between MCNet and the baseline RandLA-Net on the SensatUrban dataset.

Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

TL;DR

Abstract

Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (4)