Table of Contents
Fetching ...

A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs

Qiong Chang, Weimin Wang, Junpei Zhong, Jun Miyazaki

TL;DR

This work tackles the memory bottleneck of VANICP when running on resource-limited embedded GPUs by introducing a GPU-oriented dynamic memory assignment strategy for voxel-based dilation in ICP. The method uses a GPU voxel occupancy histogram and CPU-side address offsets with indirect addressing across three memory spaces to drastically reduce memory footprint while maintaining VANICP's performance. Experimental results show memory reductions over 97% and favorable energy efficiency across large point clouds on embedded platforms, with publication of source code for reproducibility. The approach enables real-time, memory-efficient point-cloud registration on edge devices, broadening the practical deployment of dilation-based ICP frameworks.

Abstract

This paper proposes a memory-efficient optimization strategy for the high-performance point cloud registration algorithm VANICP, enabling lightweight execution on embedded GPUs with constrained hardware resources. VANICP is a recently published acceleration framework that significantly improves the computational efficiency of point-cloud-based applications. By transforming the global nearest neighbor search into a localized process through a dilation-based information propagation mechanism, VANICP greatly reduces the computational complexity of the NNS. However, its original implementation demands a considerable amount of memory, which restricts its deployment in resource-constrained environments such as embedded systems. To address this issue, we propose a GPU-oriented dynamic memory assignment strategy that optimizes the memory usage of the dilation operation. Furthermore, based on this strategy, we construct an enhanced version of the VANICP framework that achieves over 97% reduction in memory consumption while preserving the original performance. Source code is published on: https://github.com/changqiong/VANICP4Em.git.

A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs

TL;DR

This work tackles the memory bottleneck of VANICP when running on resource-limited embedded GPUs by introducing a GPU-oriented dynamic memory assignment strategy for voxel-based dilation in ICP. The method uses a GPU voxel occupancy histogram and CPU-side address offsets with indirect addressing across three memory spaces to drastically reduce memory footprint while maintaining VANICP's performance. Experimental results show memory reductions over 97% and favorable energy efficiency across large point clouds on embedded platforms, with publication of source code for reproducibility. The approach enables real-time, memory-efficient point-cloud registration on edge devices, broadening the practical deployment of dilation-based ICP frameworks.

Abstract

This paper proposes a memory-efficient optimization strategy for the high-performance point cloud registration algorithm VANICP, enabling lightweight execution on embedded GPUs with constrained hardware resources. VANICP is a recently published acceleration framework that significantly improves the computational efficiency of point-cloud-based applications. By transforming the global nearest neighbor search into a localized process through a dilation-based information propagation mechanism, VANICP greatly reduces the computational complexity of the NNS. However, its original implementation demands a considerable amount of memory, which restricts its deployment in resource-constrained environments such as embedded systems. To address this issue, we propose a GPU-oriented dynamic memory assignment strategy that optimizes the memory usage of the dilation operation. Furthermore, based on this strategy, we construct an enhanced version of the VANICP framework that achieves over 97% reduction in memory consumption while preserving the original performance. Source code is published on: https://github.com/changqiong/VANICP4Em.git.

Paper Structure

This paper contains 10 sections, 1 equation, 4 figures, 2 tables, 4 algorithms.

Figures (4)

  • Figure 1: Comparison of memory consumption in dilation. For the TUM model (226K points), the original VANICP adopts a monolithic contiguous assignment strategy for dilation, which supports direct addressing. In contrast, the proposed method employs a segmented pointer-based assignment strategy that relies on indirect addressing.
  • Figure 2: Processing flow of the proposed framework. VOH: voxelization occupancy histogram. MOC: memory offset computation. VDMA: voxelization with dynamic memory assignment. DDMA: dilation with dynamic memory assignment. NNS: nearest neighbor searching. This strategy first performs voxel-wise histogram construction on the GPU to parallelize point counting, and then computes the memory offset of each voxel on the CPU via unified memory. By integrating offset-based indexing into subsequent processing stages, it significantly reduces the overall memory utilization.
  • Figure 3: Memory assignment of the proposed strategy.
  • Figure 4: Registration results of the dilation-based ICP applied to the source point cloud (red) and the target point cloud (green).