Table of Contents
Fetching ...

PointCNN++: Performant Convolution on Native Points

Lihan Li, Haofeng Zhong, Rui Bu, Mingchao Sun, Wenzheng Chen, Baoquan Chen, Yangyan Li

TL;DR

PointCNN++ introduces a native-point convolution that centers receptive fields on original high-precision points and uses a local voxelization only for kernel mapping, thereby preserving geometric fidelity while achieving high efficiency. The method reframes convolution as a Matrix-Vector Multiplication and Reduction (MVMR) problem and implements highly optimized GPU kernels (MVMR and VVOR) with zero intermediate memory, enabling substantial memory savings and faster training than voxel-based approaches. Empirical results show sub-voxel registration gains and strong memory/latency advantages across multiple GPUs, with KITTI and 3DMatch benchmarks demonstrating state-of-the-art or competitive performance when used as a backbone. The work also provides extensive supplementary material, including Triton implementations and cross-GPU scalability analyses, and promises open-source release to accelerate adoption.

Abstract

Existing convolutional learning methods for 3D point cloud data are divided into two paradigms: point-based methods that preserve geometric precision but often face performance challenges, and voxel-based methods that achieve high efficiency through quantization at the cost of geometric fidelity. This loss of precision is a critical bottleneck for tasks such as point cloud registration. We propose PointCNN++, a novel architectural design that fundamentally mitigates this precision-performance trade-off. It $\textbf{generalizes sparse convolution from voxels to points}$, treating voxel-based convolution as a specialized, degraded case of our more general point-based convolution. First, we introduce a point-centric convolution where the receptive field is centered on the original, high-precision point coordinates. Second, to make this high-fidelity operation performant, we design a computational strategy that operates $\textbf{natively}$ on points. We formulate the convolution on native points as a Matrix-Vector Multiplication and Reduction (MVMR) problem, for which we develop a dedicated, highly-optimized GPU kernel. Experiments demonstrate that PointCNN++ $\textbf{uses an order of magnitude less memory and is several times faster}$ than representative point-based methods. Furthermore, when used as a simple replacement for the voxel-based backbones it generalizes, it $\textbf{significantly improves point cloud registration accuracies while proving both more memory-efficient and faster}$. PointCNN++ shows that preserving geometric detail and achieving high performance are not mutually exclusive, paving the way for a new class of 3D learning with high fidelity and efficiency. Our code will be open sourced.

PointCNN++: Performant Convolution on Native Points

TL;DR

PointCNN++ introduces a native-point convolution that centers receptive fields on original high-precision points and uses a local voxelization only for kernel mapping, thereby preserving geometric fidelity while achieving high efficiency. The method reframes convolution as a Matrix-Vector Multiplication and Reduction (MVMR) problem and implements highly optimized GPU kernels (MVMR and VVOR) with zero intermediate memory, enabling substantial memory savings and faster training than voxel-based approaches. Empirical results show sub-voxel registration gains and strong memory/latency advantages across multiple GPUs, with KITTI and 3DMatch benchmarks demonstrating state-of-the-art or competitive performance when used as a backbone. The work also provides extensive supplementary material, including Triton implementations and cross-GPU scalability analyses, and promises open-source release to accelerate adoption.

Abstract

Existing convolutional learning methods for 3D point cloud data are divided into two paradigms: point-based methods that preserve geometric precision but often face performance challenges, and voxel-based methods that achieve high efficiency through quantization at the cost of geometric fidelity. This loss of precision is a critical bottleneck for tasks such as point cloud registration. We propose PointCNN++, a novel architectural design that fundamentally mitigates this precision-performance trade-off. It , treating voxel-based convolution as a specialized, degraded case of our more general point-based convolution. First, we introduce a point-centric convolution where the receptive field is centered on the original, high-precision point coordinates. Second, to make this high-fidelity operation performant, we design a computational strategy that operates on points. We formulate the convolution on native points as a Matrix-Vector Multiplication and Reduction (MVMR) problem, for which we develop a dedicated, highly-optimized GPU kernel. Experiments demonstrate that PointCNN++ than representative point-based methods. Furthermore, when used as a simple replacement for the voxel-based backbones it generalizes, it . PointCNN++ shows that preserving geometric detail and achieving high performance are not mutually exclusive, paving the way for a new class of 3D learning with high fidelity and efficiency. Our code will be open sourced.

Paper Structure

This paper contains 32 sections, 5 equations, 11 figures, 3 tables, 2 algorithms.

Figures (11)

  • Figure 1: A 2D illustration of convolutional learning for point cloud (I) with voxel-based methods (II), transform-then-convolve methods (III) and convolution on native points (IV). When a voxel center happens to be on an original point (the rare case, as depicted by $q$), the difference between (II) and (IV) is minimal. However, in the general cases, due to the forceful restricting of computation on voxel grids in (II), several problems arise: 1. misalignment between original points (e.g., point $p$) and convolution centers, 2. inaccurate neighborhood inclusion(e.g., $x$, instead of $z$, should be in the neighborhood of $p$), and 3. inaccurate convolution kernel usage (e.g., the feature associated with point $x$ is more appropriate for being convolved with the upper-left kernel as show in (IV), instead of the upper-middle kernel as shown in (II). IV preserves geometric precision as those in III, while avoiding the cumbersome irregular-to-regular "transformation".
  • Figure 2: A brute-force MVM computation inefficiently reads $\mathbf{W}_k \in \mathbb{R}^{C_{\text{out}} \times C_{\text{in}}}$ from global memory $\left| \mathcal{T} \right|$ times—once for every triplet(left). Sorting the triplets by $k$ optimizes this. Ideally, each unique $\mathbf{W}_k$ is loaded just once into on-chip memory and reused for all its associated computations(right).
  • Figure 3: Memory usage comparison of one convolution layer.
  • Figure 4: Performance comparison of one convolution layer.
  • Figure 5: Point-wise registration error visualization on KITTI dataset comparing our method with state-of-the-art baselines.
  • ...and 6 more figures