Table of Contents
Fetching ...

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi

TL;DR

SPADE addresses the inefficiency of dense pillar processing in pillar-based 3D object detection by exploiting vector sparsity in pillar encoding. It introduces a three-part co-design: a dynamic vector pruning algorithm, a sparse coordinate management hardware that converts a 2D systolic array into a vector-sparse accelerator, and sparsity-aware dataflow optimizations. The key contributions are the Rule Generation Unit for fast input-output mapping, the Gather-Scatter Unit for efficient data handling, and a configurable dataflow that adapts to different sparse convolutions, yielding substantial speedups and energy savings across edge and server platforms. The results demonstrate up to 500 FPS with minimal accuracy loss and pronounced efficiency gains, enabling real-time, edge-appropriate 3D perception for autonomous driving.

Abstract

3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for fast and accurate 3D object detection. However, the state-of-the-art methods employing PointPillars overlook the inherent sparsity of pillar encoding where only a valid pillar is encoded with a vector of channel elements, missing opportunities for significant computational reduction. Meanwhile, current sparse convolution accelerators are designed to handle only element-wise activation sparsity and do not effectively address the vector sparsity imposed by pillar encoding. In this paper, we propose SPADE, an algorithm-hardware co-design strategy to maximize vector sparsity in pillar-based 3D object detection and accelerate vector-sparse convolution commensurate with the improved sparsity. SPADE consists of three components: (1) a dynamic vector pruning algorithm balancing accuracy and computation savings from vector sparsity, (2) a sparse coordinate management hardware transforming 2D systolic array into a vector-sparse convolution accelerator, and (3) sparsity-aware dataflow optimization tailoring sparse convolution schedules for hardware efficiency. Taped-out with a commercial technology, SPADE saves the amount of computation by 36.3--89.2\% for representative 3D object detection networks and benchmarks, leading to 1.3--10.9$\times$ speedup and 1.5--12.6$\times$ energy savings compared to the ideal dense accelerator design. These sparsity-proportional performance gains equate to 4.1--28.8$\times$ speedup and 90.2--372.3$\times$ energy savings compared to the counterpart server and edge platforms.

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

TL;DR

SPADE addresses the inefficiency of dense pillar processing in pillar-based 3D object detection by exploiting vector sparsity in pillar encoding. It introduces a three-part co-design: a dynamic vector pruning algorithm, a sparse coordinate management hardware that converts a 2D systolic array into a vector-sparse accelerator, and sparsity-aware dataflow optimizations. The key contributions are the Rule Generation Unit for fast input-output mapping, the Gather-Scatter Unit for efficient data handling, and a configurable dataflow that adapts to different sparse convolutions, yielding substantial speedups and energy savings across edge and server platforms. The results demonstrate up to 500 FPS with minimal accuracy loss and pronounced efficiency gains, enabling real-time, edge-appropriate 3D perception for autonomous driving.

Abstract

3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for fast and accurate 3D object detection. However, the state-of-the-art methods employing PointPillars overlook the inherent sparsity of pillar encoding where only a valid pillar is encoded with a vector of channel elements, missing opportunities for significant computational reduction. Meanwhile, current sparse convolution accelerators are designed to handle only element-wise activation sparsity and do not effectively address the vector sparsity imposed by pillar encoding. In this paper, we propose SPADE, an algorithm-hardware co-design strategy to maximize vector sparsity in pillar-based 3D object detection and accelerate vector-sparse convolution commensurate with the improved sparsity. SPADE consists of three components: (1) a dynamic vector pruning algorithm balancing accuracy and computation savings from vector sparsity, (2) a sparse coordinate management hardware transforming 2D systolic array into a vector-sparse convolution accelerator, and (3) sparsity-aware dataflow optimization tailoring sparse convolution schedules for hardware efficiency. Taped-out with a commercial technology, SPADE saves the amount of computation by 36.3--89.2\% for representative 3D object detection networks and benchmarks, leading to 1.3--10.9 speedup and 1.5--12.6 energy savings compared to the ideal dense accelerator design. These sparsity-proportional performance gains equate to 4.1--28.8 speedup and 90.2--372.3 energy savings compared to the counterpart server and edge platforms.
Paper Structure (27 sections, 15 figures, 1 table)

This paper contains 27 sections, 15 figures, 1 table.

Figures (15)

  • Figure 1: Overview of pillar-based 3D object detection: (a) model structure, (b) feature extraction steps (backbone and head). Comparison of receptive fields (Stage 1) for various sparse convolution operations: (c) Spconv, (d) Spconv-S, and (e) SpConv-P. (f) Dynamic vector pruning for SpConv-P: vector sparsity regularization and dynamic pruning-aware fine-tuning.
  • Figure 2: Conventional sparse Conv2D accelerator for sparse pillars: (a) dataflow, (b) inefficiency (utilization, bank conflict rate). (c) Latency breakdown of PointPillars. Sparsity characteristics of sparse convolution variants: (d) SPP1, (e) SPP2, and (f) SPP3.
  • Figure 3: Overall architecture of SPADE.
  • Figure 4: Rule generation algorithm of SPADE.
  • Figure 5: (a) Hardware architecture of rule generator unit (RGU). C$_{*}$, I$_{*}$, W$_{*}$, and O$_{*}$ denote index of column, input, kernel, and output, respectively. $\times$ denotes an invalid signal. (b) Comparison of rule generation methods: hash table, sorting, RGU (ours).
  • ...and 10 more figures