3D Gaussian Point Encoders
Jim James, Ben Wilson, Simon Lucey, James Hays
TL;DR
This work tackles the problem of efficient, explicit per-point embeddings for 3D point clouds by replacing implicit PointNet-style encodings with a Gaussian-based representation. It introduces the 3D Gaussian Point Encoder (3DGPE), consisting of a Gaussian Basis Encoder and a Gaussian Basis Mixer that encode per-point features via mixtures of learnable Gaussians and are aggregated through max-pooling. To enable practical training and deployment, the authors develop natural-gradient optimization and implicit-to-explicit knowledge distillation from PointNet (and Mamba3D), along with geometry-based filtering to speed up inference. Across ModelNet40 and ScanObjectNN, 3DGPE achieves accuracy comparable to PointNet while delivering substantial throughput and memory reductions and scales effectively when embedded in Mamba3D, enabling CPU-friendly framerates and efficient edge deployment.
Abstract
In this work, we introduce the 3D Gaussian Point Encoder, an explicit per-point embedding built on mixtures of learned 3D Gaussians. This explicit geometric representation for 3D recognition tasks is a departure from widely used implicit representations such as PointNet. However, it is difficult to learn 3D Gaussian encoders in end-to-end fashion with standard optimizers. We develop optimization techniques based on natural gradients and distillation from PointNets to find a Gaussian Basis that can reconstruct PointNet activations. The resulting 3D Gaussian Point Encoders are faster and more parameter efficient than traditional PointNets. As in the 3D reconstruction literature where there has been considerable interest in the move from implicit (e.g., NeRF) to explicit (e.g., Gaussian Splatting) representations, we can take advantage of computational geometry heuristics to accelerate 3D Gaussian Point Encoders further. We extend filtering techniques from 3D Gaussian Splatting to construct encoders that run 2.7 times faster as a comparable accuracy PointNet while using 46% less memory and 88% fewer FLOPs. Furthermore, we demonstrate the effectiveness of 3D Gaussian Point Encoders as a component in Mamba3D, running 1.27 times faster and achieving a reduction in memory and FLOPs by 42% and 54% respectively. 3D Gaussian Point Encoders are lightweight enough to achieve high framerates on CPU-only devices.
