Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

Hui Zhou; Xinge Zhu; Xiao Song; Yuexin Ma; Zhe Wang; Hongsheng Li; Dahua Lin

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin

TL;DR

Cylinder3D reframes driving-scene LiDAR semantic segmentation in 3D by introducing a cylinder-based voxelization (Cylinder Partition) and a 3D U-Net backbone augmented with Asymmetric Residual Blocks and Dimension-decomposition based Context Modeling. This approach preserves 3D topology better than 2D projection methods and demonstrates significant gains on SemanticKITTI, achieving state-of-the-art performance with at least a 6 percentage-point improvement in mean IoU. The work highlights how balancing point density with cylinder grids and efficiently modeling high-rank context in 3D can enhance segmentation accuracy for outdoor, sparse LiDAR data, with practical implications for autonomous driving perception.

Abstract

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

TL;DR

Abstract

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)