Table of Contents
Fetching ...

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin

TL;DR

Cylinder3D reframes driving-scene LiDAR semantic segmentation in 3D by introducing a cylinder-based voxelization (Cylinder Partition) and a 3D U-Net backbone augmented with Asymmetric Residual Blocks and Dimension-decomposition based Context Modeling. This approach preserves 3D topology better than 2D projection methods and demonstrates significant gains on SemanticKITTI, achieving state-of-the-art performance with at least a 6 percentage-point improvement in mean IoU. The work highlights how balancing point density with cylinder grids and efficiently modeling high-rank context in 3D can enhance segmentation accuracy for outdoor, sparse LiDAR data, with practical implications for autonomous driving perception.

Abstract

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

TL;DR

Cylinder3D reframes driving-scene LiDAR semantic segmentation in 3D by introducing a cylinder-based voxelization (Cylinder Partition) and a 3D U-Net backbone augmented with Asymmetric Residual Blocks and Dimension-decomposition based Context Modeling. This approach preserves 3D topology better than 2D projection methods and demonstrates significant gains on SemanticKITTI, achieving state-of-the-art performance with at least a 6 percentage-point improvement in mean IoU. The work highlights how balancing point density with cylinder grids and efficiently modeling high-rank context in 3D can enhance segmentation accuracy for outdoor, sparse LiDAR data, with practical implications for autonomous driving perception.

Abstract

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

Paper Structure

This paper contains 16 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (Left) The detailed road map for network architecture search on SemanticKITTI, from 2D, 2.5D to 3D (Note that 2.5D means 3D grid representation and 2D backbone). (Right) The limitation of spherical projection, namely, abandons certain valuable 3D structures, where neighboring region in projection reflects significantly different locations in 3D space, which shows that spherical projection cannot maintain the 3D geometry structure.
  • Figure 2: The overall architecture. Top part is the full workflow of the proposed 3D LiDAR segmentation network, Cylinder3D. Bottom parts are the details of the Downsample block and UpSample block.
  • Figure 3: The pipeline of Cylinder Partition. It first transforms points on Cartesian coordinate to Cylinder coordinate. Then a cylinder partition is introduced to perform the voxelization. Finally, cylinder features are produced by a simplified pointnet.
  • Figure 4: The detailed framework of asymmetry residual block and dimension-decomposition based context modeling.
  • Figure 5: Visualization on validation set. The left is ground-truth and right is our prediction.