Table of Contents
Fetching ...

Accelerating Sparse Convolutions in Voxel-Based Point Cloud Networks

Dionysios Adamopoulos, Anastasia Poulopoulou, Georgios Goumas, Christina Giannoula

TL;DR

This work tackles the bottleneck of Sparse Convolution (SpC) on voxel-based point clouds by exploiting voxel-specific properties to accelerate GPU execution. It introduces Spira, a voxel-property-aware SpC engine with four innovations: one-shot z-delta mapping, packed-native voxel indexing, adaptive hybrid dataflow, and network-wide voxel indexing. Empirical results show substantial end-to-end and layer-wise speedups across diverse networks, datasets, and NVIDIA GPUs, with up to 2.31× end-to-end improvements and significant reductions in mapping and pre-/post-processing overhead. The approach enables practical, scalable acceleration of SpC for large-scale 3D perception tasks and will be open-sourced for broader adoption and further research.

Abstract

Sparse Convolution (SpC) powers 3D point cloud networks widely used in autonomous driving and AR/VR. SpC builds a kernel map that stores mappings between input voxel coordinates, output coordinates, and weight offsets, then uses this map to compute feature vectors for output coordinates. Our work identifies three key properties of voxel coordinates: they are integer-valued, bounded within a limited spatial range, and geometrically continuous-neighboring voxels on the same object surface are highly likely to exist at small spatial offsets from each other. Prior SpC engines do not fully exploit these properties and suffer from high pre-processing and post-processing overheads during kernel map construction. To address this, we design Spira, the first voxel-property-aware SpC engine for GPUs. Spira proposes: (i) a high-performance one-shot search algorithm that builds the kernel map with no preprocessing and high memory locality, (ii) an effective packed-native processing scheme that accesses packed voxel coordinates at low cost, (iii) a flexible dual-dataflow execution mechanism that efficiently computes output feature vectors by adapting to layer characteristics, and (iv) a network-wide parallelization strategy that builds kernel maps for all SpC layers concurrently at network start. Our evaluation shows that Spira significantly outperforms prior SpC engines by 1.71x on average and up to 2.31x for end-to-end inference, and by 2.13x on average and up to 3.32x for layer-wise execution across diverse layer configurations.

Accelerating Sparse Convolutions in Voxel-Based Point Cloud Networks

TL;DR

This work tackles the bottleneck of Sparse Convolution (SpC) on voxel-based point clouds by exploiting voxel-specific properties to accelerate GPU execution. It introduces Spira, a voxel-property-aware SpC engine with four innovations: one-shot z-delta mapping, packed-native voxel indexing, adaptive hybrid dataflow, and network-wide voxel indexing. Empirical results show substantial end-to-end and layer-wise speedups across diverse networks, datasets, and NVIDIA GPUs, with up to 2.31× end-to-end improvements and significant reductions in mapping and pre-/post-processing overhead. The approach enables practical, scalable acceleration of SpC for large-scale 3D perception tasks and will be open-sourced for broader adoption and further research.

Abstract

Sparse Convolution (SpC) powers 3D point cloud networks widely used in autonomous driving and AR/VR. SpC builds a kernel map that stores mappings between input voxel coordinates, output coordinates, and weight offsets, then uses this map to compute feature vectors for output coordinates. Our work identifies three key properties of voxel coordinates: they are integer-valued, bounded within a limited spatial range, and geometrically continuous-neighboring voxels on the same object surface are highly likely to exist at small spatial offsets from each other. Prior SpC engines do not fully exploit these properties and suffer from high pre-processing and post-processing overheads during kernel map construction. To address this, we design Spira, the first voxel-property-aware SpC engine for GPUs. Spira proposes: (i) a high-performance one-shot search algorithm that builds the kernel map with no preprocessing and high memory locality, (ii) an effective packed-native processing scheme that accesses packed voxel coordinates at low cost, (iii) a flexible dual-dataflow execution mechanism that efficiently computes output feature vectors by adapting to layer characteristics, and (iv) a network-wide parallelization strategy that builds kernel maps for all SpC layers concurrently at network start. Our evaluation shows that Spira significantly outperforms prior SpC engines by 1.71x on average and up to 2.31x for end-to-end inference, and by 2.13x on average and up to 3.32x for layer-wise execution across diverse layer configurations.

Paper Structure

This paper contains 20 sections, 2 equations, 12 figures.

Figures (12)

  • Figure 1: The two dataflows of the feature computation step.
  • Figure 2: Layer time breakdown using various SpC engines. Numbers on bars are speedup over TorchSparse++ output-stationary.
  • Figure 3: a) A voxelized wall surface. b) In a submanifold layer with $K=5$, the average density of kernel map columns for weight offsets grouped by their L1-norm for three different datasets.
  • Figure 4: Spira's one-shot z-delta search algorithm.
  • Figure 5: Spira's adaptive hybrid-dataflow feature computation.
  • ...and 7 more figures