HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference

Yiming Gao; Chao Jiang; Wesley Piard; Xiangru Chen; Bhavesh Patel; Herman Lam

HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference

Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam

TL;DR

The paper tackles the challenge of real-time end-to-end point-cloud processing on edge devices, where latency is driven by memory-intensive pre-processing (down-sampling) and the input-structure step for inference. It introduces HgPCN, a heterogeneous CPU-FPGA architecture that combines Octree-Indexed-Sampling (OIS) for memory-efficient pre-processing with a VEG-based Data Structuring Unit (DSU) to accelerate input preparation for a DLA-based inference engine. Key contributions include a CPU-side Octree build and memory pre-configuration, a FPGA-side OIS down-sampling, and a VEG-enabled DSU that substantially reduces data-structuring workload, enabling end-to-end processing at real-time KITTI-like rates with notable memory savings. The results show large speedups over CPU/GPU baselines and existing PCN accelerators, validating the practicality of end-to-end edge PCN and offering directions for approximate variants and broader applicability to other accelerators.

Abstract

Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve great success in using inference to perform point cloud analysis, including object part segmentation, shape classification, and so on. However, point cloud applications on the computing edge require more than just the inference step. They require an end-to-end (E2E) processing of the point cloud workloads: pre-processing of raw data, input preparation, and inference to perform point cloud analysis. Current PCN approaches to support end-to-end processing of point cloud workload cannot meet the real-time latency requirement on the edge, i.e., the ability of the AI service to keep up with the speed of raw data generation by 3D sensors. Latency for end-to-end processing of the point cloud workloads stems from two reasons: memory-intensive down-sampling in the pre-processing phase and the data structuring step for input preparation in the inference phase. In this paper, we present HgPCN, an end-to-end heterogeneous architecture for real-time embedded point cloud applications. In HgPCN, we introduce two novel methodologies based on spatial indexing to address the two identified bottlenecks. In the Pre-processing Engine of HgPCN, an Octree-Indexed-Sampling method is used to optimize the memory-intensive down-sampling bottleneck of the pre-processing phase. In the Inference Engine, HgPCN extends a commercial DLA with a customized Data Structuring Unit which is based on a Voxel-Expanded Gathering method to fundamentally reduce the workload of the data structuring step in the inference phase.

HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference

TL;DR

Abstract

Paper Structure (20 sections, 16 figures, 1 table)

This paper contains 20 sections, 16 figures, 1 table.

Introduction
Background and Related Work
Point Cloud Data and PCNs
Current PCN Accelerators
Motivation
Analysis of Frontend Pre-processing
Analysis of Backend PCN Inference
HgPCN Architecture
Pre-processing Engine
Octree-build Unit in the CPU
Down-sampling Unit in the FPGA
Inference engine
Evaluation
Evaluation Setup
Analysis of the OIS method on CPU
...and 5 more sections

Figures (16)

Figure 1: (a) Two phases of an end-to-end point clouds AI service (classification task), (b) Overall architecture to process the two phases.
Figure 2: Illustration of the steps of an end-to-end PCN inference (toy-valued pedestrian classification task).
Figure 3: End-to-end execution time breakdown (actual time not shown).
Figure 4: Architecture overview of HgPCN.
Figure 5: Octree-Indexed-Sampling method overview: (a) A point cloud character “A” (black and color points). Note, for simplicity, that it is a 2D Quadtree illustration of our Octree-Index-Sampling (OIS) method. An Octree contains two horizontal levels of Quadtrees, having an extra Z dimension. (b) Corresponding Quadtree representation. For simplicity, only node “11” is fully expanded. As shown, the content of a Quad-tree is stored in a Quadtree-Table in the Down-sampling Unit; and the raw points corresponding to the Quadtree are pre-configurated in the Host Memory. (c) An example of OIS steps to create the Sampled-Point-Table, which contains the corresponding Host Memory addresses of the K picked points, where K is a pre-defined number, e.g., 4096.
...and 11 more figures

HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference

TL;DR

Abstract

HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (16)