Table of Contents
Fetching ...

Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds

Lisa Weijler, Pedro Hermosilla

TL;DR

This work tackles the challenge of achieving $SE(3)$ equivariance in 3D point clouds without prohibitive computation by introducing a continuous, frame-based group convolution operating over a local receptive field.Key idea: lift point features to the SE(3) group and compute convolutions using a carefully constructed per-point frame $\mathcal{F}(x)$ derived from PCA, enabling exact equivariance with a small, finite grid and allowing stochastic sampling of 1–4 frame elements to control cost.Empirical results on ModelNet40, DFAUST, PosePrior, and ScanNet show competitive or superior performance to both discrete and non-equivariant baselines, with particularly strong robustness to local rotations and unseen poses and negligible overhead when using minimal frame samples.Overall, the approach provides an efficient route to local SE(3) equivariance in point clouds, enabling reliable 3D understanding in multi-object scenes with scalable computation.

Abstract

Extending the translation equivariance property of convolutional neural networks to larger symmetry groups has been shown to reduce sample complexity and enable more discriminative feature learning. Further, exploiting additional symmetries facilitates greater weight sharing than standard convolutions, leading to an enhanced network expressivity without an increase in parameter count. However, extending the equivariant properties of a convolution layer comes at a computational cost. In particular, for 3D data, expanding equivariance to the SE(3) group (rotation and translation) results in a 6D convolution operation, which is not tractable for larger data samples such as 3D scene scans. While efforts have been made to develop efficient SE(3) equivariant networks, existing approaches rely on discretization or only introduce global rotation equivariance. This limits their applicability to point clouds representing a scene composed of multiple objects. This work presents an efficient, continuous, and local SE(3) equivariant convolution layer for point cloud processing based on general group convolution and local reference frames. Our experiments show that our approach achieves competitive or superior performance across a range of datasets and tasks, including object classification and semantic segmentation, with negligible computational overhead.

Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds

TL;DR

This work tackles the challenge of achieving $SE(3)$ equivariance in 3D point clouds without prohibitive computation by introducing a continuous, frame-based group convolution operating over a local receptive field.Key idea: lift point features to the SE(3) group and compute convolutions using a carefully constructed per-point frame $\mathcal{F}(x)$ derived from PCA, enabling exact equivariance with a small, finite grid and allowing stochastic sampling of 1–4 frame elements to control cost.Empirical results on ModelNet40, DFAUST, PosePrior, and ScanNet show competitive or superior performance to both discrete and non-equivariant baselines, with particularly strong robustness to local rotations and unseen poses and negligible overhead when using minimal frame samples.Overall, the approach provides an efficient route to local SE(3) equivariance in point clouds, enabling reliable 3D understanding in multi-object scenes with scalable computation.

Abstract

Extending the translation equivariance property of convolutional neural networks to larger symmetry groups has been shown to reduce sample complexity and enable more discriminative feature learning. Further, exploiting additional symmetries facilitates greater weight sharing than standard convolutions, leading to an enhanced network expressivity without an increase in parameter count. However, extending the equivariant properties of a convolution layer comes at a computational cost. In particular, for 3D data, expanding equivariance to the SE(3) group (rotation and translation) results in a 6D convolution operation, which is not tractable for larger data samples such as 3D scene scans. While efforts have been made to develop efficient SE(3) equivariant networks, existing approaches rely on discretization or only introduce global rotation equivariance. This limits their applicability to point clouds representing a scene composed of multiple objects. This work presents an efficient, continuous, and local SE(3) equivariant convolution layer for point cloud processing based on general group convolution and local reference frames. Our experiments show that our approach achieves competitive or superior performance across a range of datasets and tasks, including object classification and semantic segmentation, with negligible computational overhead.

Paper Structure

This paper contains 38 sections, 1 theorem, 6 equations, 5 figures, 13 tables.

Key Result

Theorem 1

Let $\mathcal{F}$ be an SE(3)-equivariant frame. Then, $\Phi_{\mathcal{F}}$ is SE(3)-equivariant.

Figures (5)

  • Figure 1: While global equivariant designs ensure robustness to whole-scene rotations, they fail with randomly rotated scene parts or elements. In contrast, local equivariant operations maintain robustness by handling local geometry rotations around each point.
  • Figure 2: Overview of our convolution operation. Given a central point with an orientation, first, we sample neighboring points. For each point, we use PCA to build a frame from it. Then, we sample an orientation from the frame. Then, the input to the group convolution kernel is the relative position plus the relative orientations between points.
  • Figure 3: Qualitative results. Global equivariant methods such as VN, or FA struggle with out-of-distribution models. Our method, on the other hand, achieves almost perfect predictions. Lastly, MC also achieves good performance but falls behind our method, which better approximates the group convolution integral.
  • Figure 4: Additional Qualitative results. Global equivariant methods such as VN, or FA struggle with out-of-distribution models, especially up-side down models. Our method, on the other hand, achieves almost perfect predictions. Lastly, MC also achieves good performance but falls behind our method, as seen in the leftmost columns when looking at the left upper arm prediction.
  • Figure 5: Additional Qualitative results. Global equivariant methods such as VN, or FA struggle with out-of-distribution models. Our method, on the other hand, achieves almost perfect predictions. Lastly, as seen in all three examples, MC also achieves good performance but falls behind our method.

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof