Table of Contents
Fetching ...

3D Gaussian Point Encoders

Jim James, Ben Wilson, Simon Lucey, James Hays

TL;DR

This work tackles the problem of efficient, explicit per-point embeddings for 3D point clouds by replacing implicit PointNet-style encodings with a Gaussian-based representation. It introduces the 3D Gaussian Point Encoder (3DGPE), consisting of a Gaussian Basis Encoder and a Gaussian Basis Mixer that encode per-point features via mixtures of learnable Gaussians and are aggregated through max-pooling. To enable practical training and deployment, the authors develop natural-gradient optimization and implicit-to-explicit knowledge distillation from PointNet (and Mamba3D), along with geometry-based filtering to speed up inference. Across ModelNet40 and ScanObjectNN, 3DGPE achieves accuracy comparable to PointNet while delivering substantial throughput and memory reductions and scales effectively when embedded in Mamba3D, enabling CPU-friendly framerates and efficient edge deployment.

Abstract

In this work, we introduce the 3D Gaussian Point Encoder, an explicit per-point embedding built on mixtures of learned 3D Gaussians. This explicit geometric representation for 3D recognition tasks is a departure from widely used implicit representations such as PointNet. However, it is difficult to learn 3D Gaussian encoders in end-to-end fashion with standard optimizers. We develop optimization techniques based on natural gradients and distillation from PointNets to find a Gaussian Basis that can reconstruct PointNet activations. The resulting 3D Gaussian Point Encoders are faster and more parameter efficient than traditional PointNets. As in the 3D reconstruction literature where there has been considerable interest in the move from implicit (e.g., NeRF) to explicit (e.g., Gaussian Splatting) representations, we can take advantage of computational geometry heuristics to accelerate 3D Gaussian Point Encoders further. We extend filtering techniques from 3D Gaussian Splatting to construct encoders that run 2.7 times faster as a comparable accuracy PointNet while using 46% less memory and 88% fewer FLOPs. Furthermore, we demonstrate the effectiveness of 3D Gaussian Point Encoders as a component in Mamba3D, running 1.27 times faster and achieving a reduction in memory and FLOPs by 42% and 54% respectively. 3D Gaussian Point Encoders are lightweight enough to achieve high framerates on CPU-only devices.

3D Gaussian Point Encoders

TL;DR

This work tackles the problem of efficient, explicit per-point embeddings for 3D point clouds by replacing implicit PointNet-style encodings with a Gaussian-based representation. It introduces the 3D Gaussian Point Encoder (3DGPE), consisting of a Gaussian Basis Encoder and a Gaussian Basis Mixer that encode per-point features via mixtures of learnable Gaussians and are aggregated through max-pooling. To enable practical training and deployment, the authors develop natural-gradient optimization and implicit-to-explicit knowledge distillation from PointNet (and Mamba3D), along with geometry-based filtering to speed up inference. Across ModelNet40 and ScanObjectNN, 3DGPE achieves accuracy comparable to PointNet while delivering substantial throughput and memory reductions and scales effectively when embedded in Mamba3D, enabling CPU-friendly framerates and efficient edge deployment.

Abstract

In this work, we introduce the 3D Gaussian Point Encoder, an explicit per-point embedding built on mixtures of learned 3D Gaussians. This explicit geometric representation for 3D recognition tasks is a departure from widely used implicit representations such as PointNet. However, it is difficult to learn 3D Gaussian encoders in end-to-end fashion with standard optimizers. We develop optimization techniques based on natural gradients and distillation from PointNets to find a Gaussian Basis that can reconstruct PointNet activations. The resulting 3D Gaussian Point Encoders are faster and more parameter efficient than traditional PointNets. As in the 3D reconstruction literature where there has been considerable interest in the move from implicit (e.g., NeRF) to explicit (e.g., Gaussian Splatting) representations, we can take advantage of computational geometry heuristics to accelerate 3D Gaussian Point Encoders further. We extend filtering techniques from 3D Gaussian Splatting to construct encoders that run 2.7 times faster as a comparable accuracy PointNet while using 46% less memory and 88% fewer FLOPs. Furthermore, we demonstrate the effectiveness of 3D Gaussian Point Encoders as a component in Mamba3D, running 1.27 times faster and achieving a reduction in memory and FLOPs by 42% and 54% respectively. 3D Gaussian Point Encoders are lightweight enough to achieve high framerates on CPU-only devices.

Paper Structure

This paper contains 32 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Base architecture of 3DGPE. An input point cloud is first pre-processed, such as by a T-Net or through Farthest Point Sampling and KNN. Afterwards, each input point is processed independently through the Gaussian Basis Encoder by first computing a set of Gaussian likelihoods, followed by the Gaussian Basis Mixer, mixing the likelihoods to form a set of embeddings for each activation volume. We max-pool across points to derive a global feature which is then passed to a downstream classifier, such as an MLP.
  • Figure 2: Implicit to Explicit 3D Knowledge Distillation. Points are sampled and pre-processed (T-Net or FPS + KNN) before being passed through each encoder. We then measure $L_1$ loss between the 3D Gaussian Point Encoder and PointNet per-point embeddings. Maroon outlines indicate trainable components, while blue indicates frozen components.
  • Figure 3: Pairwise Gaussian-Point Filtering. (a) Distance filtering only evaluates Gaussian-Point pairs within a radius of a Gaussian's mean. (b) Bounding-box Filtering evaluates Gaussian-Point pairs when a point falls within the axis-aligned bounding box center on a Gaussian. (c) Voxel Filtering evaluates Gaussian-Point pairs when a point lies in a voxel occupied with sufficiently high likelihood by a given Gaussian.