Table of Contents
Fetching ...

Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks

Xipeng Lin, Shanshi Huang, Hongwu Jiang

TL;DR

Voxel-CIM is proposed, an efficient Compute-in-Memory based accelerator for voxel-based neural network processing that reduces off-chip memory access for map search, and employs the in-memory computing paradigm and designs innovative weight mapping strategies to efficiently process Sparse 3D convolutions and 2D convolutions.

Abstract

The 3D point cloud perception has emerged as a fundamental role for a wide range of applications. In particular, with the rapid development of neural networks, the voxel-based networks attract great attention due to their excellent performance. Various accelerator designs have been proposed to improve the hardware performance of voxel-based networks, especially to speed up the map search process. However, several challenges still exist including: (1) massive off-chip data access volume caused by map search operations, notably for high resolution and dense distribution cases, (2) frequent data movement for data-intensive convolution operations, (3) imbalanced workload caused by irregular sparsity of point data. To address the above challenges, we propose Voxel-CIM, an efficient Compute-in-Memory based accelerator for voxel-based neural network processing. To reduce off-chip memory access for map search, a depth-encoding-based output major search approach is introduced to maximize data reuse, achieving stable $O(N)$-level data access volume in various situations. Voxel-CIM also employs the in-memory computing paradigm and designs innovative weight mapping strategies to efficiently process Sparse 3D convolutions and 2D convolutions. Implemented on 22 nm technology and evaluated on representative benchmarks, the Voxel-CIM achieves averagely 4.5~7.0$\times$ higher energy efficiency (10.8 TOPS/w), and 2.4~5.4$\times$ speed up in detection task and 1.2~8.1$\times$ speed up in segmentation task compared to the state-of-the-art point cloud accelerators and powerful GPUs.

Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks

TL;DR

Voxel-CIM is proposed, an efficient Compute-in-Memory based accelerator for voxel-based neural network processing that reduces off-chip memory access for map search, and employs the in-memory computing paradigm and designs innovative weight mapping strategies to efficiently process Sparse 3D convolutions and 2D convolutions.

Abstract

The 3D point cloud perception has emerged as a fundamental role for a wide range of applications. In particular, with the rapid development of neural networks, the voxel-based networks attract great attention due to their excellent performance. Various accelerator designs have been proposed to improve the hardware performance of voxel-based networks, especially to speed up the map search process. However, several challenges still exist including: (1) massive off-chip data access volume caused by map search operations, notably for high resolution and dense distribution cases, (2) frequent data movement for data-intensive convolution operations, (3) imbalanced workload caused by irregular sparsity of point data. To address the above challenges, we propose Voxel-CIM, an efficient Compute-in-Memory based accelerator for voxel-based neural network processing. To reduce off-chip memory access for map search, a depth-encoding-based output major search approach is introduced to maximize data reuse, achieving stable -level data access volume in various situations. Voxel-CIM also employs the in-memory computing paradigm and designs innovative weight mapping strategies to efficiently process Sparse 3D convolutions and 2D convolutions. Implemented on 22 nm technology and evaluated on representative benchmarks, the Voxel-CIM achieves averagely 4.5~7.0 higher energy efficiency (10.8 TOPS/w), and 2.4~5.4 speed up in detection task and 1.2~8.1 speed up in segmentation task compared to the state-of-the-art point cloud accelerators and powerful GPUs.
Paper Structure (8 sections, 2 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 8 sections, 2 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: The generic network architecture of voxel-based algorithms for segmentation and detection tasks.
  • Figure 2: (a) The reverse mapping pairs can be inferred due to symmetry. (b) The visualization of input cloud data. (c) Voxelization with low resolution VS. voxelization with high resolution. (d) The comparison of normalized off-chip data access volume in various situations. (To simulate buffer limitations in extreme cases, we set the buffer size to match the length of the merger sorter, which is 64.)
  • Figure 3: The workflow of DOMS
  • Figure 4: Block-DOMS. Voxels of neighboring blocks in $y^{-}$ and $y^{+}$ direction (yellow regions) can be easily located by depth-encoding tables. Voxels of the neighboring block in the $x^{+}$ direction can be copied to $Block_{(i, j)}$.
  • Figure 5: (a) Traditional weight mapping method for convolution (b) Sub-matrices mapping method for Spconv3D (c) Sub-matrices mapping method for Conv2D (Take K=3 as an example)
  • ...and 6 more figures