Table of Contents
Fetching ...

ViGG: Robust RGB-D Point Cloud Registration using Visual-Geometric Mutual Guidance

Congjia Chen, Shen Yan, Yufu Qu

TL;DR

ViGG tackles RGB-D point cloud registration by pairing visual (image) matches with geometric features through mutual reinforcement.It introduces GVCA to filter ambiguous visual cliques using geometric hints, and VGM to guide geometric matching with priors derived from visual results, enabling noise-robust correspondences.Across 3DMatch, ScanNet, and KITTI, ViGG achieves state-of-the-art performance in both learning-free and learning-based settings and shows robustness to visual noise and low-overlap conditions.The approach offers practical effectiveness for RGB-D and cross-sensor registration scenarios with favorable efficiency.

Abstract

Point cloud registration is a fundamental task in 3D vision. Most existing methods only use geometric information for registration. Recently proposed RGB-D registration methods primarily focus on feature fusion or improving feature learning, which limits their ability to exploit image information and hinders their practical applicability. In this paper, we propose ViGG, a robust RGB-D registration method using mutual guidance. First, we solve clique alignment in a visual-geometric combination form, employing a geometric guidance design to suppress ambiguous cliques. Second, to mitigate accuracy degradation caused by noise in visual matches, we propose a visual-guided geometric matching method that utilizes visual priors to determine the search space, enabling the extraction of high-quality, noise-insensitive correspondences. This mutual guidance strategy brings our method superior robustness, making it applicable for various RGB-D registration tasks. The experiments on 3DMatch, ScanNet and KITTI datasets show that our method outperforms recent state-of-the-art methods in both learning-free and learning-based settings. Code is available at https://github.com/ccjccjccj/ViGG.

ViGG: Robust RGB-D Point Cloud Registration using Visual-Geometric Mutual Guidance

TL;DR

ViGG tackles RGB-D point cloud registration by pairing visual (image) matches with geometric features through mutual reinforcement.It introduces GVCA to filter ambiguous visual cliques using geometric hints, and VGM to guide geometric matching with priors derived from visual results, enabling noise-robust correspondences.Across 3DMatch, ScanNet, and KITTI, ViGG achieves state-of-the-art performance in both learning-free and learning-based settings and shows robustness to visual noise and low-overlap conditions.The approach offers practical effectiveness for RGB-D and cross-sensor registration scenarios with favorable efficiency.

Abstract

Point cloud registration is a fundamental task in 3D vision. Most existing methods only use geometric information for registration. Recently proposed RGB-D registration methods primarily focus on feature fusion or improving feature learning, which limits their ability to exploit image information and hinders their practical applicability. In this paper, we propose ViGG, a robust RGB-D registration method using mutual guidance. First, we solve clique alignment in a visual-geometric combination form, employing a geometric guidance design to suppress ambiguous cliques. Second, to mitigate accuracy degradation caused by noise in visual matches, we propose a visual-guided geometric matching method that utilizes visual priors to determine the search space, enabling the extraction of high-quality, noise-insensitive correspondences. This mutual guidance strategy brings our method superior robustness, making it applicable for various RGB-D registration tasks. The experiments on 3DMatch, ScanNet and KITTI datasets show that our method outperforms recent state-of-the-art methods in both learning-free and learning-based settings. Code is available at https://github.com/ccjccjccj/ViGG.

Paper Structure

This paper contains 21 sections, 14 equations, 8 figures, 16 tables.

Figures (8)

  • Figure 1: ViGG takes point clouds and their corresponding images as input, estimated the transformation between two point clouds.
  • Figure 2: Pipeline of our method. Using the extracted visual matches and geometric features, the geometric-guided visual clique alignment module first estimates a prior transformation $\mathbf{T}^{pri}$. Then, the visual-guided geometric matching module iteratively determines the search zone for each point in $\mathbf{P}$ and extracts high-quality correspondences. The correspondences are used to estimate the updated transformation $\mathbf{T}$.
  • Figure 3: Iterative strategy uses the updated transformation to redetermine clearer local search zones for $\mathbf{x}^{p}_{i}$, which helps reassign the incorrect or ambiguous matches when $\mathbf{T}^{pri}$ is inaccurate.
  • Figure 4: In the KITTI dataset, the camera's field of view is much smaller than that of the LiDAR. The portion of the point cloud within the camera's field of view (colored in green) is less than a quarter of the entire point cloud.
  • Figure S1: 3D points and image pixels are captured by sensors with different ranges, not the entire image is available for registration. We mark the pixels with valid 3D point mapping in red.
  • ...and 3 more figures