Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds
Lisa Weijler, Pedro Hermosilla
TL;DR
This work tackles the challenge of achieving $SE(3)$ equivariance in 3D point clouds without prohibitive computation by introducing a continuous, frame-based group convolution operating over a local receptive field.Key idea: lift point features to the SE(3) group and compute convolutions using a carefully constructed per-point frame $\mathcal{F}(x)$ derived from PCA, enabling exact equivariance with a small, finite grid and allowing stochastic sampling of 1–4 frame elements to control cost.Empirical results on ModelNet40, DFAUST, PosePrior, and ScanNet show competitive or superior performance to both discrete and non-equivariant baselines, with particularly strong robustness to local rotations and unseen poses and negligible overhead when using minimal frame samples.Overall, the approach provides an efficient route to local SE(3) equivariance in point clouds, enabling reliable 3D understanding in multi-object scenes with scalable computation.
Abstract
Extending the translation equivariance property of convolutional neural networks to larger symmetry groups has been shown to reduce sample complexity and enable more discriminative feature learning. Further, exploiting additional symmetries facilitates greater weight sharing than standard convolutions, leading to an enhanced network expressivity without an increase in parameter count. However, extending the equivariant properties of a convolution layer comes at a computational cost. In particular, for 3D data, expanding equivariance to the SE(3) group (rotation and translation) results in a 6D convolution operation, which is not tractable for larger data samples such as 3D scene scans. While efforts have been made to develop efficient SE(3) equivariant networks, existing approaches rely on discretization or only introduce global rotation equivariance. This limits their applicability to point clouds representing a scene composed of multiple objects. This work presents an efficient, continuous, and local SE(3) equivariant convolution layer for point cloud processing based on general group convolution and local reference frames. Our experiments show that our approach achieves competitive or superior performance across a range of datasets and tasks, including object classification and semantic segmentation, with negligible computational overhead.
