HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference
Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam
TL;DR
The paper tackles the challenge of real-time end-to-end point-cloud processing on edge devices, where latency is driven by memory-intensive pre-processing (down-sampling) and the input-structure step for inference. It introduces HgPCN, a heterogeneous CPU-FPGA architecture that combines Octree-Indexed-Sampling (OIS) for memory-efficient pre-processing with a VEG-based Data Structuring Unit (DSU) to accelerate input preparation for a DLA-based inference engine. Key contributions include a CPU-side Octree build and memory pre-configuration, a FPGA-side OIS down-sampling, and a VEG-enabled DSU that substantially reduces data-structuring workload, enabling end-to-end processing at real-time KITTI-like rates with notable memory savings. The results show large speedups over CPU/GPU baselines and existing PCN accelerators, validating the practicality of end-to-end edge PCN and offering directions for approximate variants and broader applicability to other accelerators.
Abstract
Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve great success in using inference to perform point cloud analysis, including object part segmentation, shape classification, and so on. However, point cloud applications on the computing edge require more than just the inference step. They require an end-to-end (E2E) processing of the point cloud workloads: pre-processing of raw data, input preparation, and inference to perform point cloud analysis. Current PCN approaches to support end-to-end processing of point cloud workload cannot meet the real-time latency requirement on the edge, i.e., the ability of the AI service to keep up with the speed of raw data generation by 3D sensors. Latency for end-to-end processing of the point cloud workloads stems from two reasons: memory-intensive down-sampling in the pre-processing phase and the data structuring step for input preparation in the inference phase. In this paper, we present HgPCN, an end-to-end heterogeneous architecture for real-time embedded point cloud applications. In HgPCN, we introduce two novel methodologies based on spatial indexing to address the two identified bottlenecks. In the Pre-processing Engine of HgPCN, an Octree-Indexed-Sampling method is used to optimize the memory-intensive down-sampling bottleneck of the pre-processing phase. In the Inference Engine, HgPCN extends a commercial DLA with a customized Data Structuring Unit which is based on a Voxel-Expanded Gathering method to fundamentally reduce the workload of the data structuring step in the inference phase.
