RKHS-BA: A Robust Correspondence-Free Multi-View Registration Framework with Semantic Point Clouds

Ray Zhang; Jingwei Song; Xiang Gao; Junzhe Wu; Tianyi Liu; Jinyuan Zhang; Ryan Eustice; Maani Ghaffari

RKHS-BA: A Robust Correspondence-Free Multi-View Registration Framework with Semantic Point Clouds

Ray Zhang, Jingwei Song, Xiang Gao, Junzhe Wu, Tianyi Liu, Jinyuan Zhang, Ryan Eustice, Maani Ghaffari

TL;DR

RKHS-BA addresses robust, correspondence-free multi-view registration by representing each frame as an RKHS function that jointly encodes geometry, color, and semantics. The method constructs a global objective $F(\u2113)=\sum_{(m,n)\in\mathcal{C}} \langle f_{T_m X_m}, f_{T_n X_n}\rangle_{\mathcal{H}}$, and solves it via an IRLS backend fed by a squared exponential kernel with a lengthscale that decays across inner-outer loops. A key innovation is global rotation initialization over the Icosahedral SO(3) group, enabling large-misalignment registrations, followed by sliding-window and batch BA to achieve both local and global consistency. Extensive experiments on synthetic Bunny data, TartanAir sequences, SemanticKITTI, and a self-collected Cassie dataset demonstrate improved robustness to outliers and semantic noise, with the semantic-enabled variants often outperforming intensity-only and traditional baselines. The approach is open-sourced, enabling adoption in RGB-D and LiDAR SLAM/SfM pipelines for robust map and trajectory estimation.

Abstract

This work reports a novel multi-frame Bundle Adjustment (BA) framework called RKHS-BA. It uses continuous landmark representations that encode RGB-D/LiDAR and semantic observations in a Reproducing Kernel Hilbert Space (RKHS). With a correspondence-free pose graph formulation, the proposed system constructs a loss function that achieves more generalized convergence than classical point-wise convergence. We demonstrate its applications in multi-view point cloud registration, sliding-window odometry, and global LiDAR mapping on simulated and real data. It shows highly robust pose estimations in extremely noisy scenes and exhibits strong generalization with various types of semantic inputs. The open source implementation is released in https://github.com/UMich-CURLY/RKHS_BA.

RKHS-BA: A Robust Correspondence-Free Multi-View Registration Framework with Semantic Point Clouds

TL;DR

, and solves it via an IRLS backend fed by a squared exponential kernel with a lengthscale that decays across inner-outer loops. A key innovation is global rotation initialization over the Icosahedral SO(3) group, enabling large-misalignment registrations, followed by sliding-window and batch BA to achieve both local and global consistency. Extensive experiments on synthetic Bunny data, TartanAir sequences, SemanticKITTI, and a self-collected Cassie dataset demonstrate improved robustness to outliers and semantic noise, with the semantic-enabled variants often outperforming intensity-only and traditional baselines. The approach is open-sourced, enabling adoption in RGB-D and LiDAR SLAM/SfM pipelines for robust map and trajectory estimation.

Abstract

Paper Structure (36 sections, 17 equations, 18 figures, 2 tables)

This paper contains 36 sections, 17 equations, 18 figures, 2 tables.

INTRODUCTION
Related Works
Registration of Multiple Point Sets
Direct BA
Feature-based BA
Learning-based BA
Problem Setup and Notations
Review of SemanticCVO clarkmaani20Zhang2020semanticcvoMGhaffari-RSS-19
Generalized Multi-view Registration in RKHS
Full Correspondence-Free BA Pipeline
Rotation Initialization Strategy and Pose Graph Construction
Initialization of two frames
Initialization of the pose graph
Semantically Informed Iteratively Reweighted Least Squares Backend
From RKHS to IRLS
...and 21 more sections

Figures (18)

Figure 1: We represent a point cloud observation as a function in the Reproducing Kernel Hilbert Space (RKHS), denoted as $f_{X_{m}}$, where $X_m$ is the raw sensor measurements containing both geometric information like 3D points and non-geometric information such as color, intensity, and semantics. An inner product $\langle f_{\mathbf{T}_m X_m}, f_{\mathbf{T}_n X_n}\rangle_{\mathcal{H}}$ measures the alignment of two functions at timestamp $m$ and $n$. The full objective function consisting of multiple frames is formulated as the sum of all inner products between all pairs of relevant frames.
Figure 2: Full Pipeline: To construct a globally consistent world model, we propose a five-step process, while each step's optimization is initialized from its previous step's poses. a) In the initialization stage, we register the first two frames with the global rotation initialization scheme for large unknown motions. b) With constant velocity initialization, we run frame-to-frame visual odometry Zhang2020semanticcvoclarkmaani20 to calculate the poses for each new frame. c) With local RKHS-BA, we run sliding-window optimization to refine the poses from odometry further. d) When loop closure happens, we perform PGO grisetti2011g2okaess2012isam2 while the loop closing poses are computed from step (a). e) Finally, we run a batch RKHS-BA to obtain a globally consistent world model.
Figure 3: We sample the space of potential initial rotation candidates with the finest cover of $\mathrm{SO}(3)$, the icosahedron group. 60 different configurations are ranked based on the $\cos$ alignment ratio, while the maximum one is chosen as the initial value of the frame-to-frame registration.
Figure 4: An example of a two-view point cloud registration test with FPFH rusu2009fpfh invariant feature information on the Bunny bunny Dataset. (a) The two partially overlapped point clouds of the Bunny Dataset, each perturbed by 50% random outliers and 50% cropping. (b) The two Bunny point clouds after we apply initial rotations of $180$ degrees around a random axis and a random translation of 0.5$m$. (c) FGR's registration result. (d) RANSAC's registration result. (e) The proposed method's registration results using global rotational initialization.
Figure 5: The benchmark results of the two-frame registrations on the Bunny Dataset bunny. Each box plot contains the resulting pose errors in the norm of matrix logarithm under different outlier ratios and cropping ratios at the same $90^{\circ}$ initial rotation angle. (a) 0% cropping (b) 12.5% cropping (c) 25% cropping (d) 37.5% cropping (e) 50% cropping.
...and 13 more figures

Theorems & Definitions (1)

Remark 1

RKHS-BA: A Robust Correspondence-Free Multi-View Registration Framework with Semantic Point Clouds

TL;DR

Abstract

RKHS-BA: A Robust Correspondence-Free Multi-View Registration Framework with Semantic Point Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (18)

Theorems & Definitions (1)