Table of Contents
Fetching ...

Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation

Yu Zheng, Guangming Wang, Jiuming Liu, Marc Pollefeys, Hesheng Wang

TL;DR

The Spherical Frustum sparse Convolution Network (SFCNet) is presented, and extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that the SFCNet outperforms the 2D image-based semantic segmentation methods based on conventional spherical projection.

Abstract

LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment. Recently, many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation. However, since more than one point can be projected onto the same 2D position but only one point can be preserved, the previous 2D image-based segmentation methods suffer from inevitable quantized information loss. To avoid quantized information loss, in this paper, we propose a novel spherical frustum structure. The points projected onto the same 2D position are preserved in the spherical frustums. Moreover, we propose a memory-efficient hash-based representation of spherical frustums. Through the hash-based representation, we propose the Spherical Frustum sparse Convolution (SFC) and Frustum Fast Point Sampling (F2PS) to convolve and sample the points stored in spherical frustums respectively. Finally, we present the Spherical Frustum sparse Convolution Network (SFCNet) to adopt 2D CNNs for LiDAR point cloud semantic segmentation without quantized information loss. Extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that our SFCNet outperforms the 2D image-based semantic segmentation methods based on conventional spherical projection. Codes will be available at https://github.com/IRMVLab/SFCNet.

Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation

TL;DR

The Spherical Frustum sparse Convolution Network (SFCNet) is presented, and extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that the SFCNet outperforms the 2D image-based semantic segmentation methods based on conventional spherical projection.

Abstract

LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment. Recently, many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation. However, since more than one point can be projected onto the same 2D position but only one point can be preserved, the previous 2D image-based segmentation methods suffer from inevitable quantized information loss. To avoid quantized information loss, in this paper, we propose a novel spherical frustum structure. The points projected onto the same 2D position are preserved in the spherical frustums. Moreover, we propose a memory-efficient hash-based representation of spherical frustums. Through the hash-based representation, we propose the Spherical Frustum sparse Convolution (SFC) and Frustum Fast Point Sampling (F2PS) to convolve and sample the points stored in spherical frustums respectively. Finally, we present the Spherical Frustum sparse Convolution Network (SFCNet) to adopt 2D CNNs for LiDAR point cloud semantic segmentation without quantized information loss. Extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that our SFCNet outperforms the 2D image-based semantic segmentation methods based on conventional spherical projection. Codes will be available at https://github.com/IRMVLab/SFCNet.
Paper Structure (25 sections, 5 equations, 10 figures, 13 tables)

This paper contains 25 sections, 5 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Difference between our spherical frustum and conventional spherical projection. In conventional spherical projection, the points projected onto the same 2D grid are dropped, which leads to quantized information loss, e.g., dropping the boundary between the person, a small object, and the road, and results in incorrect prediction of the 2D projection-based method RangeViT ando2023rangevit for the person. In contrast, our spherical frustum preserves all points in the frustum, which eliminates quantized information loss and makes SFCNet correctly segment the person.
  • Figure 2: Pipeline of Spherical Frustum sparse Convolution. The spherical frustums in the convolution kernel and the points in these spherical frustums are first selected through the hash table. Then, the nearest point in each spherical frustum is determined by the 3D geometric information. Finally, the sparse convolution is performed on the selected point features.
  • Figure 3: Pipeline of Frustum Farthest Point Sampling. According to the downsampling strides, the spherical frustums in each stride window are downsampled. Then, through the hash table, the points in each downsampled spherical frustum are queried. The queried points are sampled by Farthest Point Sampling (FPS) based on the 3D geometric information. Finally, the uniformly sampled spherical frustums and point cloud are obtained.
  • Figure 4: Qualitive results on SemanticKITTI validation set. The first column presents the ground truths, while the following three columns show the error maps of the predictions from the three methods. Specifically, the reference from point color to the semantic class in the ground truths is shown at the bottom. In addition, the false-segmented points are marked as red in the error maps. Moreover, we use circles with the same color to point out the same objects in the ground truth and the three error maps. Furthermore, the corresponding RGB images of each scene with the colored point cloud projected are demonstrated. We also show the corresponding zoomed RGB image view of circled objects if they are visible in the RGB images.
  • Figure 5: The Detailed Architecture of SFCNet. (a) presents the detailed pipeline of SFCNet. In addition, (b), (c), and (d) show the detailed module structures of the SFC layer, SFC block, and downsampling SFC block respectively, where SFC means spherical frustum sparse convolution, and F2PS means the frustum farthest point sampling.
  • ...and 5 more figures