Table of Contents
Fetching ...

FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features

Keisuke Sugiura, Hiroki Matsutani

TL;DR

This work addresses fast, accurate 3D point cloud registration on resource-constrained edge devices by introducing FPGA-accelerated, correspondence-free methods. It combines a streamlined PointNet feature extractor with two dedicated IP cores, PointLKCore for PointNetLK and ReAgentCore for ReAgent, leveraging LLT-based quantization to keep all network parameters on-chip. The design achieves substantial speedups (tens of times faster than CPU and embedded GPUs) and dramatic energy efficiency while maintaining competitive accuracy, even under noise and large initial misalignments. The approach enables real-time registration on low-power hardware and demonstrates strong generalization to unseen categories and real-world scans, with design-space exploration guiding optimal FPGA implementations.

Abstract

Point cloud registration serves as a basis for vision and robotic applications including 3D reconstruction and mapping. Despite significant improvements on the quality of results, recent deep learning approaches are computationally expensive and power-hungry, making them difficult to deploy on resource-constrained edge devices. To tackle this problem, in this paper, we propose a fast, accurate, and robust registration for low-cost embedded FPGAs. Based on a parallel and pipelined PointNet feature extractor, we develop custom accelerator cores namely PointLKCore and ReAgentCore, for two different learning-based methods. They are both correspondence-free and computationally efficient as they avoid the costly feature matching step involving nearest-neighbor search. The proposed cores are implemented on the Xilinx ZCU104 board and evaluated using both synthetic and real-world datasets, showing the substantial improvements in the trade-offs between runtime and registration quality. They run 44.08-45.75x faster than ARM Cortex-A53 CPU and offer 1.98-11.13x speedups over Intel Xeon CPU and Nvidia Jetson boards, while consuming less than 1W and achieving 163.11-213.58x energy-efficiency compared to Nvidia GeForce GPU. The proposed cores are more robust to noise and large initial misalignments than the classical methods and quickly find reasonable solutions in less than 15ms, demonstrating the real-time performance.

FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features

TL;DR

This work addresses fast, accurate 3D point cloud registration on resource-constrained edge devices by introducing FPGA-accelerated, correspondence-free methods. It combines a streamlined PointNet feature extractor with two dedicated IP cores, PointLKCore for PointNetLK and ReAgentCore for ReAgent, leveraging LLT-based quantization to keep all network parameters on-chip. The design achieves substantial speedups (tens of times faster than CPU and embedded GPUs) and dramatic energy efficiency while maintaining competitive accuracy, even under noise and large initial misalignments. The approach enables real-time registration on low-power hardware and demonstrates strong generalization to unseen categories and real-world scans, with design-space exploration guiding optimal FPGA implementations.

Abstract

Point cloud registration serves as a basis for vision and robotic applications including 3D reconstruction and mapping. Despite significant improvements on the quality of results, recent deep learning approaches are computationally expensive and power-hungry, making them difficult to deploy on resource-constrained edge devices. To tackle this problem, in this paper, we propose a fast, accurate, and robust registration for low-cost embedded FPGAs. Based on a parallel and pipelined PointNet feature extractor, we develop custom accelerator cores namely PointLKCore and ReAgentCore, for two different learning-based methods. They are both correspondence-free and computationally efficient as they avoid the costly feature matching step involving nearest-neighbor search. The proposed cores are implemented on the Xilinx ZCU104 board and evaluated using both synthetic and real-world datasets, showing the substantial improvements in the trade-offs between runtime and registration quality. They run 44.08-45.75x faster than ARM Cortex-A53 CPU and offer 1.98-11.13x speedups over Intel Xeon CPU and Nvidia Jetson boards, while consuming less than 1W and achieving 163.11-213.58x energy-efficiency compared to Nvidia GeForce GPU. The proposed cores are more robust to noise and large initial misalignments than the classical methods and quickly find reasonable solutions in less than 15ms, demonstrating the real-time performance.
Paper Structure (44 sections, 30 equations, 22 figures, 5 tables, 2 algorithms)

This paper contains 44 sections, 30 equations, 22 figures, 5 tables, 2 algorithms.

Figures (22)

  • Figure 1: Registration results for ModelNet40 (Unseen) and ScanObjectNN (rightmost three columns) (gray: source, green: transformed source, orange: template).
  • Figure 2: Step-by-step visualization of the registration results (with the rotational ISO (isotropic) errors) (gray: source, green: transformed source, orange: template).
  • Figure 3: Overview of the PointNet feature extractor module. $N$ points are processed in tiles of $B$ points to reduce the on-chip memory cost (for intermediate point features) from $O(N)$ to $O(B)$.
  • Figure 4: Block diagram of the point cloud feature extractor.
  • Figure 5: Block diagram of PointLKCore.
  • ...and 17 more figures