Table of Contents
Fetching ...

PointSlice: Accurate and Efficient Slice-Based Representation for 3D Object Detection from Point Clouds

Liu Qifeng, Zhao Dawei, Dong Yabo, Xiao Liang, Wang Juan, Min Chen, Li Fuyang, Jiang Weizhong, Lu Dongming, Nie Yiming

TL;DR

This work proposes a novel point cloud processing method, PointSlice, which slices point clouds along the horizontal plane and incorporates a dedicated detection network and incorporates a Slice Interaction Network (SIN) into the 2D backbone network, thereby improving the model's 3D perception capability.

Abstract

3D object detection from point clouds plays a critical role in autonomous driving. Currently, the primary methods for point cloud processing are voxel-based and pillar-based approaches. Voxel-based methods offer high accuracy through fine-grained spatial segmentation but suffer from slower inference speeds. Pillar-based methods enhance inference speed but typically lag behind voxel-based methods in detection accuracy. To address this trade-off, we propose a novel point cloud processing method, PointSlice, which slices point clouds along the horizontal plane and incorporates a dedicated detection network. The main contributions of PointSlice are: (1) A novel slice-based representation that converts 3D point clouds into multiple sets of 2D (x-y) data slices. The model explicitly learns 2D data distributions by treating the 3D point cloud as separate batches of 2D data, which significantly reduces the parameter count and enhances inference speed; (2) The introduction of a Slice Interaction Network (SIN). To preserve vertical geometric relationships across slices, we incorporate SIN into the 2D backbone network, thereby improving the model's 3D perception capability. Extensive experiments demonstrate that PointSlice achieves a superior balance between detection accuracy and efficiency. On the Waymo Open Dataset, PointSlice achieves a 1.13$\times$ speedup and uses 0.79$\times$ the parameters of the state-of-the-art voxel-based method (SAFDNet), with a marginal 1.2 mAPH accuracy reduction. On the nuScenes dataset, we achieve a state-of-the-art 66.7 mAP. On the Argoverse 2 dataset, PointSlice is 1.10$\times$ faster with 0.66$\times$ the parameters, while showing a negligible accuracy drop of 1.0 mAP. The source code is available at https://github.com/qifeng22/PointSlice2.

PointSlice: Accurate and Efficient Slice-Based Representation for 3D Object Detection from Point Clouds

TL;DR

This work proposes a novel point cloud processing method, PointSlice, which slices point clouds along the horizontal plane and incorporates a dedicated detection network and incorporates a Slice Interaction Network (SIN) into the 2D backbone network, thereby improving the model's 3D perception capability.

Abstract

3D object detection from point clouds plays a critical role in autonomous driving. Currently, the primary methods for point cloud processing are voxel-based and pillar-based approaches. Voxel-based methods offer high accuracy through fine-grained spatial segmentation but suffer from slower inference speeds. Pillar-based methods enhance inference speed but typically lag behind voxel-based methods in detection accuracy. To address this trade-off, we propose a novel point cloud processing method, PointSlice, which slices point clouds along the horizontal plane and incorporates a dedicated detection network. The main contributions of PointSlice are: (1) A novel slice-based representation that converts 3D point clouds into multiple sets of 2D (x-y) data slices. The model explicitly learns 2D data distributions by treating the 3D point cloud as separate batches of 2D data, which significantly reduces the parameter count and enhances inference speed; (2) The introduction of a Slice Interaction Network (SIN). To preserve vertical geometric relationships across slices, we incorporate SIN into the 2D backbone network, thereby improving the model's 3D perception capability. Extensive experiments demonstrate that PointSlice achieves a superior balance between detection accuracy and efficiency. On the Waymo Open Dataset, PointSlice achieves a 1.13 speedup and uses 0.79 the parameters of the state-of-the-art voxel-based method (SAFDNet), with a marginal 1.2 mAPH accuracy reduction. On the nuScenes dataset, we achieve a state-of-the-art 66.7 mAP. On the Argoverse 2 dataset, PointSlice is 1.10 faster with 0.66 the parameters, while showing a negligible accuracy drop of 1.0 mAP. The source code is available at https://github.com/qifeng22/PointSlice2.

Paper Structure

This paper contains 22 sections, 9 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Comparison of different point cloud processing methods: pillar-based, voxel-based, and batch slices (Ours).
  • Figure 2: Overall framework of PointSlice. The dashed boxes labeled "Pointcloud to Slice" and "Slice Interaction Network (SIN)" represent the main contributions of this paper. The raw point clouds, after voxelization and slicing, are input into a 2D backbone network for feature extraction. The 2D backbone network is composed of SIN-STEM and SIN-EDB. The SIN-STEM consists of 2D Sparse Residual Blocks (2DSRB) and SIN, which are responsible for efficient and effective feature generation. The SIN-EDB is constructed from 2DSRB, SIN, 2D Sparse encoder-decoder block (2DEDB), and AFD modules, designed to capture long-range dependencies among features.
  • Figure 3: Detailed structure of the 2D-SRB.
  • Figure 4: Composition of the SIN and 2D-EDB models, with the design of the 2D-EDB module adapted from SAFDNet safdnet.
  • Figure 5: Visual comparison of feature maps before and after processing at different stages. The left column shows the features before the SIN module, and the right column shows the enhanced features after interaction.
  • ...and 1 more figures