Table of Contents
Fetching ...

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing

Yuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai "Helen'' Li, Yiran Chen

TL;DR

FractalCloud addresses the inefficiency of processing large-scale point clouds with point-based neural networks by introducing a fractal-inspired partitioning (Fractal) and block-parallel point operations (BPPO) that localize computations and enable fully on-chip parallelism. The architecture comprises a Fractal Engine, RSPUs for sampling and neighbor searching, and a memory- and data-reuse oriented dataflow, all orchestrated by a RISC-V controller in a 28 nm implementation. Empirical results show substantial gains, averaging 21.7× speedup and 27× energy reduction over state-of-the-art accelerators, while preserving accuracy within 0.7% across multiple networks and tasks. The work demonstrates scalable, hardware-friendly acceleration for large-scale PNNs, with strong implications for autonomous driving, robotics, and AR/VR where 3D sensing yields large point clouds.

Abstract

Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs. However, as PNNs evolve to process large-scale point clouds with hundreds of thousands of points, all-to-all computation and global memory access in point cloud processing introduce substantial overhead, causing $O(n^2)$ computational complexity and memory traffic where n is the number of points}. Existing accelerators, primarily optimized for small-scale workloads, overlook this challenge and scale poorly due to inefficient partitioning and non-parallel architectures. To address these issues, we propose FractalCloud, a fractal-inspired hardware architecture for efficient large-scale 3D point cloud processing. FractalCloud introduces two key optimizations: (1) a co-designed Fractal method for shape-aware and hardware-friendly partitioning, and (2) block-parallel point operations that decompose and parallelize all point operations. A dedicated hardware design with on-chip fractal and flexible parallelism further enables fully parallel processing within limited memory resources. Implemented in 28 nm technology as a chip layout with a core area of 1.5 $mm^2$, FractalCloud achieves 21.7x speedup and 27x energy reduction over state-of-the-art accelerators while maintaining network accuracy, demonstrating its scalability and efficiency for PNN inference.

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing

TL;DR

FractalCloud addresses the inefficiency of processing large-scale point clouds with point-based neural networks by introducing a fractal-inspired partitioning (Fractal) and block-parallel point operations (BPPO) that localize computations and enable fully on-chip parallelism. The architecture comprises a Fractal Engine, RSPUs for sampling and neighbor searching, and a memory- and data-reuse oriented dataflow, all orchestrated by a RISC-V controller in a 28 nm implementation. Empirical results show substantial gains, averaging 21.7× speedup and 27× energy reduction over state-of-the-art accelerators, while preserving accuracy within 0.7% across multiple networks and tasks. The work demonstrates scalable, hardware-friendly acceleration for large-scale PNNs, with strong implications for autonomous driving, robotics, and AR/VR where 3D sensing yields large point clouds.

Abstract

Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs. However, as PNNs evolve to process large-scale point clouds with hundreds of thousands of points, all-to-all computation and global memory access in point cloud processing introduce substantial overhead, causing computational complexity and memory traffic where n is the number of points}. Existing accelerators, primarily optimized for small-scale workloads, overlook this challenge and scale poorly due to inefficient partitioning and non-parallel architectures. To address these issues, we propose FractalCloud, a fractal-inspired hardware architecture for efficient large-scale 3D point cloud processing. FractalCloud introduces two key optimizations: (1) a co-designed Fractal method for shape-aware and hardware-friendly partitioning, and (2) block-parallel point operations that decompose and parallelize all point operations. A dedicated hardware design with on-chip fractal and flexible parallelism further enables fully parallel processing within limited memory resources. Implemented in 28 nm technology as a chip layout with a core area of 1.5 , FractalCloud achieves 21.7x speedup and 27x energy reduction over state-of-the-art accelerators while maintaining network accuracy, demonstrating its scalability and efficiency for PNN inference.

Paper Structure

This paper contains 36 sections, 18 figures, 2 tables, 2 algorithms.

Figures (18)

  • Figure 1: The memory access (MB) and inferencing latency (ms) of point cloud neural networks in (a) original baseline structure and (b) the proposed FractalCloud.
  • Figure 2: The workflows for (a) farthest point sampling, (b) ball query, and (c) interpolation. (d) The backbone of PNNs.
  • Figure 3: Comparison of different partitioning strategies on latency and network accuracy for PointNeXt segmenting S3DIS dataset across (a) the original point cloud (PointAcc lin2021pointacc), (b) uniform partitioning (PNNPU kim2021pnnpu), (c) KD-tree-based partitioning (Crescent feng2022crescent), and (d) the proposed Fractal method.
  • Figure 4: The latency for PN++, PNXt, and PVr qi2017pointnet++qian2022pointnextdeng2023pointvector inference on GPU for classification and segmentation workloads, with notations defined in Table \ref{['tab:evaluatedModels']}, under varying input points.
  • Figure 5: Workflows for KD-tree and proposed Fractal. BS is the block size, indicating the max points per block.
  • ...and 13 more figures