PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators
Keondo Park, You Rim Choi, Inhoe Lee, Hyung-Sin Kim
TL;DR
PointSplit tackles the challenge of on-device 3D object detection on edge devices with heterogeneous accelerators by jointly optimizing system and algorithm design. It introduces two parallel set abstraction pipelines guided by 2D semantic information, a semantics-aware biased sampling mechanism, and role-based group-wise quantization to enable efficient, INT8 execution across GPU and NPU. Empirical results on SUN RGB-D and Scannet V2 show dramatic latency reductions (up to 24.7× faster) with preserved accuracy compared to GPU-only baselines, validating the viability of multi-type accelerators for real-time 3D perception. The work also provides an open TensorFlow/TensorFlow Lite implementation and demonstrates the broader potential of heterogeneous-edge platforms for complex vision tasks.
Abstract
Running deep learning models on resource-constrained edge devices has drawn significant attention due to its fast response, privacy preservation, and robust operation regardless of Internet connectivity. While these devices already cope with various intelligent tasks, the latest edge devices that are equipped with multiple types of low-power accelerators (i.e., both mobile GPU and NPU) can bring another opportunity; a task that used to be too heavy for an edge device in the single-accelerator world might become viable in the upcoming heterogeneous-accelerator world.To realize the potential in the context of 3D object detection, we identify several technical challenges and propose PointSplit, a novel 3D object detection framework for multi-accelerator edge devices that addresses the problems. Specifically, our PointSplit design includes (1) 2D semantics-aware biased point sampling, (2) parallelized 3D feature extraction, and (3) role-based group-wise quantization. We implement PointSplit on TensorFlow Lite and evaluate it on a customized hardware platform comprising both mobile GPU and EdgeTPU. Experimental results on representative RGB-D datasets, SUN RGB-D and Scannet V2, demonstrate that PointSplit on a multi-accelerator device is 24.7 times faster with similar accuracy compared to the full-precision, 2D-3D fusion-based 3D detector on a GPU-only device.
