RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration
Congjia Chen, Xiaoyu Jia, Yanhong Zheng, Yufu Qu
TL;DR
RGBD-Glue addresses RGB-D point-cloud registration by decoupling visual and geometric features, then combining them through an explicit transformation-consistency filter and an adaptive threshold. Visual correspondences provide a rough prior to estimate a transformation and its error distribution, from which a distribution-informed threshold $\epsilon$ selects credible geometric matches; the final registration is obtained by a weighted Procrustes fit over the fused set. The approach is flexible, working with hand-crafted or learning-based descriptors, and demonstrates state-of-the-art performance on ScanNet and 3DMatch, while maintaining robustness under large frame spacing and across multiple visual features. This yields a more robust and practical RGB-D registration pipeline that leverages complementary cues without brittle, tightly fused representations.
Abstract
Point cloud registration is a fundamental task for estimating rigid transformations between point clouds. Previous studies have used geometric information for extracting features, matching and estimating transformation. Recently, owing to the advancement of RGB-D sensors, researchers have attempted to combine visual and geometric information to improve registration performance. However, these studies focused on extracting distinctive features by deep feature fusion, which cannot effectively solve the negative effects of each feature's weakness, and cannot sufficiently leverage the valid information. In this paper, we propose a new feature combination framework, which applies a looser but more effective combination. An explicit filter based on transformation consistency is designed for the combination framework, which can overcome each feature's weakness. And an adaptive threshold determined by the error distribution is proposed to extract more valid information from the two types of features. Owing to the distinctive design, our proposed framework can estimate more accurate correspondences and is applicable to both hand-crafted and learning-based feature descriptors. Experiments on ScanNet and 3DMatch show that our method achieves a state-of-the-art performance.
