3D Point Cloud Object Detection on Edge Devices for Split Computing
Taisuke Noguchi, Takuya Azumi
TL;DR
The paper addresses the challenge of running accurate 3D object detection on edge devices with limited compute power by deploying Split Computing to partition the DNN between edge devices and edge servers. It uses OpenPCDet and Voxel R-CNN, with pre-defined split points (e.g., after VFE or early Backbone 3D layers) to transmit mid-model data instead of raw LiDAR, reducing edge load and total latency while mitigating privacy risks. Empirical results on a Jetson Orin Nano with KITTI show up to 70.8% faster inference and up to 90% faster edge-device execution when splitting after voxelization, and up to 57.1%/69.5% improvements when split inside the network, alongside data-transfer-size tradeoffs and a privacy-aware rationale for deeper splits. The work provides practical guidelines for split-point selection, demonstrates significant practical impact for infrastructure-assisted autonomous driving, and points to future improvements in mid-model data compression and multi-LiDAR integration.
Abstract
The field of autonomous driving technology is rapidly advancing, with deep learning being a key component. Particularly in the field of sensing, 3D point cloud data collected by LiDAR is utilized to run deep neural network models for 3D object detection. However, these state-of-the-art models are complex, leading to longer processing times and increased power consumption on edge devices. The objective of this study is to address these issues by leveraging Split Computing, a distributed machine learning inference method. Split Computing aims to lessen the computational burden on edge devices, thereby reducing processing time and power consumption. Furthermore, it minimizes the risk of data breaches by only transmitting intermediate data from the deep neural network model. Experimental results show that splitting after voxelization reduces the inference time by 70.8% and the edge device execution time by 90.0%. When splitting within the network, the inference time is reduced by up to 57.1%, and the edge device execution time is reduced by up to 69.5%.
