Table of Contents
Fetching ...

Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning

Ray Zhang, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Ryan Eustice, Maani Ghaffari, Arnie Sen

TL;DR

This work addresses robust 3D point cloud registration under SE(3) without explicit point correspondences by formulating the problem in a reproducing kernel Hilbert space and learning SE(3)-equivariant feature representations. The core method, EquivAlign, performs differentiable, correspondence-free pose regression in RKHS using a novel kernel that couples coordinate and steerable-vector information, with an unsupervised bi-level training regime and curriculum learning. Key contributions include (i) a lightweight SE(3)-equivariant point representation with steerable vectors, (ii) a differentiable inner-outer loop framework that optimizes pose and kernel parameters in feature space, and (iii) strong empirical results on ModelNet40 and ETH3D showing robustness to noise, outliers, and partial overlap without ground-truth labels. The approach advances unsupervised equivariant learning for 3D registration and enables accurate RGB-D odometry in realistic settings, with potential impact on robotics and computer vision applications.

Abstract

This paper introduces a robust unsupervised SE(3) point cloud registration method that operates without requiring point correspondences. The method frames point clouds as functions in a reproducing kernel Hilbert space (RKHS), leveraging SE(3)-equivariant features for direct feature space registration. A novel RKHS distance metric is proposed, offering reliable performance amidst noise, outliers, and asymmetrical data. An unsupervised training approach is introduced to effectively handle limited ground truth data, facilitating adaptation to real datasets. The proposed method outperforms classical and supervised methods in terms of registration accuracy on both synthetic (ModelNet40) and real-world (ETH3D) noisy, outlier-rich datasets. To our best knowledge, this marks the first instance of successful real RGB-D odometry data registration using an equivariant method. The code is available at {https://sites.google.com/view/eccv24-equivalign}

Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning

TL;DR

This work addresses robust 3D point cloud registration under SE(3) without explicit point correspondences by formulating the problem in a reproducing kernel Hilbert space and learning SE(3)-equivariant feature representations. The core method, EquivAlign, performs differentiable, correspondence-free pose regression in RKHS using a novel kernel that couples coordinate and steerable-vector information, with an unsupervised bi-level training regime and curriculum learning. Key contributions include (i) a lightweight SE(3)-equivariant point representation with steerable vectors, (ii) a differentiable inner-outer loop framework that optimizes pose and kernel parameters in feature space, and (iii) strong empirical results on ModelNet40 and ETH3D showing robustness to noise, outliers, and partial overlap without ground-truth labels. The approach advances unsupervised equivariant learning for 3D registration and enables accurate RGB-D odometry in realistic settings, with potential impact on robotics and computer vision applications.

Abstract

This paper introduces a robust unsupervised SE(3) point cloud registration method that operates without requiring point correspondences. The method frames point clouds as functions in a reproducing kernel Hilbert space (RKHS), leveraging SE(3)-equivariant features for direct feature space registration. A novel RKHS distance metric is proposed, offering reliable performance amidst noise, outliers, and asymmetrical data. An unsupervised training approach is introduced to effectively handle limited ground truth data, facilitating adaptation to real datasets. The proposed method outperforms classical and supervised methods in terms of registration accuracy on both synthetic (ModelNet40) and real-world (ETH3D) noisy, outlier-rich datasets. To our best knowledge, this marks the first instance of successful real RGB-D odometry data registration using an equivariant method. The code is available at {https://sites.google.com/view/eccv24-equivalign}
Paper Structure (21 sections, 14 equations, 4 figures, 6 tables)

This paper contains 21 sections, 14 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Registration in RKHS with Unsupervised Learning of Equivariant Features: The registration process takes equivariant feature embeddings $\phi(X)$ and $\phi(Z)$ from point clouds $X=\{x_i\}\subset \mathbb{R}^3$ and $Z=\{z_j\}\subset \mathbb{R}^3$. The point cloud embeddings are represented as continuous functions $f_{\phi(X)}$ and $f_{\phi(Z)}$ in RKHS, allowing for the utilization of a distance metric, $\| f_{\phi(X)} - h f_{\phi(Z)} \|^2_{\mathcal{H}}$, for direct estimation of the pose $h\in \mathrm{SE}(3)$ in the feature space. In the feature space, each point is denoted as $x_i \oplus \mathbf{\tilde{f}}_i$ and represents the 3D coordinate, naturally exhibiting translation equivariance. This, combined with $\mathbf{\tilde{f}}_i$, the $\mathrm{SO}(3)$ equivariant vectors, achieves $\mathrm{SE}(3)$ equivariance.
  • Figure 2: EquivAlign Architecture: An iterative, fully differentiable, and inner-outer loop structured unsupervised $\mathrm{SE}(3)$ registration framework enables correspondence-free feature space pose regression. During the training phase, the outer loop accumulates loss from the inner loop, which is dedicated to iterative pose adjustments aimed at refining the encoder. During the inference stage, raw point clouds are processed in a single pass by the encoder. Subsequently, the inner loop proceeds to iteratively optimize the pose that acts on the feature space, continuing until convergence is reached.
  • Figure 3: SE(3)-Equivariant Representation of Point Feature: (Left) Visualization of the two raw input point clouds in blue and red, being the 3D coordinate itself. (Middle) The direct sum representation of equivariant point features of the two point clouds at the initial relative pose, with each point appending its steerable vectors (for simplicity, three arrows per point are used in the illustration, representing three channels of each point's steerable features). (Right) Applied ground truth $\mathrm{SE}(3)$ transformation to the feature space, resulting in an exact overlap of the two representations of the point set, affirming the precision of the equivariant representation.
  • Figure 4: An airplane example of the point cloud registration at $90^{\circ}$ initial angle, with Gaussian noise $\mathcal{N}(0,0.01)$ along the surface normal direction and $20\%$ uniformly distributed outliers. The equivariant registrations outperform the invariant and ICP-based methods. EquivAlign has a better yaw angle compared to E2PN.