Table of Contents
Fetching ...

FuseFPS: Accelerating Farthest Point Sampling with Fusing KD-tree Construction for Point Clouds

Meng Han, Liang Wang, Limin Xiao, Hao Zhang, Chenhao Zhang, Xilong Xie, Shuai Zheng, Jin Dong

TL;DR

FuseFPS is presented, an architecture and algorithm co-design for bucket-based farthest point sampling that fuses the KD-tree construction stage into the point sampling stage, further reducing memory accesses and design an efficient accelerator for bucket-based point sampling.

Abstract

Point cloud analytics has become a critical workload for embedded and mobile platforms across various applications. Farthest point sampling (FPS) is a fundamental and widely used kernel in point cloud processing. However, the heavy external memory access makes FPS a performance bottleneck for real-time point cloud processing. Although bucket-based farthest point sampling can significantly reduce unnecessary memory accesses during the point sampling stage, the KD-tree construction stage becomes the predominant contributor to execution time. In this paper, we present FuseFPS, an architecture and algorithm co-design for bucket-based farthest point sampling. We first propose a hardware-friendly sampling-driven KD-tree construction algorithm. The algorithm fuses the KD-tree construction stage into the point sampling stage, further reducing memory accesses. Then, we design an efficient accelerator for bucket-based point sampling. The accelerator can offload the entire bucket-based FPS kernel at a low hardware cost. Finally, we evaluate our approach on various point cloud datasets. The detailed experiments show that compared to the state-of-the-art accelerator QuickFPS, FuseFPS achieves about 4.3$\times$ and about 6.1$\times$ improvements on speed and power efficiency, respectively.

FuseFPS: Accelerating Farthest Point Sampling with Fusing KD-tree Construction for Point Clouds

TL;DR

FuseFPS is presented, an architecture and algorithm co-design for bucket-based farthest point sampling that fuses the KD-tree construction stage into the point sampling stage, further reducing memory accesses and design an efficient accelerator for bucket-based point sampling.

Abstract

Point cloud analytics has become a critical workload for embedded and mobile platforms across various applications. Farthest point sampling (FPS) is a fundamental and widely used kernel in point cloud processing. However, the heavy external memory access makes FPS a performance bottleneck for real-time point cloud processing. Although bucket-based farthest point sampling can significantly reduce unnecessary memory accesses during the point sampling stage, the KD-tree construction stage becomes the predominant contributor to execution time. In this paper, we present FuseFPS, an architecture and algorithm co-design for bucket-based farthest point sampling. We first propose a hardware-friendly sampling-driven KD-tree construction algorithm. The algorithm fuses the KD-tree construction stage into the point sampling stage, further reducing memory accesses. Then, we design an efficient accelerator for bucket-based point sampling. The accelerator can offload the entire bucket-based FPS kernel at a low hardware cost. Finally, we evaluate our approach on various point cloud datasets. The detailed experiments show that compared to the state-of-the-art accelerator QuickFPS, FuseFPS achieves about 4.3 and about 6.1 improvements on speed and power efficiency, respectively.
Paper Structure (17 sections, 11 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: (a) Illustration of FPS for point cloud. (b) Characterize performance of FPS on point cloud neural network. (c) Latency breakdown of BFPS on various point cloud sizes. The red line represents the percentage of the KD-tree construction for total execution time. (d) The comparison between QuickFPS and FuseFPS for execution bucket-based farthest point sampling.
  • Figure 2: Illustration of the bucket-based farthest point sampling algorithm. The point cloud consists of 12 points. At the point sampling stage, we assume two points (P4, P10) has been sampled.
  • Figure 3: Illustration of the bucket data structure.
  • Figure 4: Comparison of separated KD-tree construction with point sampling and fused KD-tree construction with point sampling. The workload is splitting a point cloud (A) into four buckets (D, E, F, G) and sampling four points.
  • Figure 5: Overview of FuseFPS architecture.
  • ...and 6 more figures