Table of Contents
Fetching ...

F2M-Reg: Unsupervised RGB-D Point Cloud Registration with Frame-to-Model Optimization

Zhinan Yu, Zheng Qin, Yijie Tang, Yongjun Wang, Renjiao Yi, Chenyang Zhu, Kai Xu

TL;DR

F2M-Reg tackles unsupervised RGB-D point cloud registration by shifting supervision from frame-to-frame to frame-to-model. It leverages a neural implicit field as a global scene model to refine per-frame poses, providing robust supervision under lighting changes, occlusions, and low overlap. A two-stage pipeline combines synthetic warming-up with real-world frame-to-model optimization, using a two-branch registration network and a suite of rendering and geometric losses to train registration without pose annotations. The method achieves state-of-the-art performance across benchmarks (ScanNet, 3DMatch, ScanNet++, 7-Scenes), with notable gains in challenging settings, and demonstrates strong data scalability and potential for lifelong learning in 3D registration.

Abstract

This work studies the problem of unsupervised RGB-D point cloud registration, which aims at training a robust registration model without ground-truth pose supervision. Existing methods usually leverages unposed RGB-D sequences and adopt a frame-to-frame framework based on differentiable rendering to train the registration model, which enforces the photometric and geometric consistency between the two frames for supervision. However, this frame-to-frame framework is vulnerable to inconsistent factors between different frames, e.g., lighting changes, geometry occlusion, and reflective materials, which leads to suboptimal convergence of the registration model. In this paper, we propose a novel frame-to-model optimization framework named F2M-Reg for unsupervised RGB-D point cloud registration. We leverage the neural implicit field as a global model of the scene and optimize the estimated poses of the frames by registering them to the global model, and the registration model is subsequently trained with the optimized poses. Thanks to the global encoding capability of neural implicit field, our frame-to-model framework is significantly more robust to inconsistent factors between different frames and thus can provide better supervision for the registration model. Besides, we demonstrate that F2M-Reg can be further enhanced by a simplistic synthetic warming-up strategy. To this end, we construct a photorealistic synthetic dataset named Sim-RGBD to initialize the registration model for the frame-to-model optimization on real-world RGB-D sequences. Extensive experiments on four challenging benchmarks have shown that our method surpasses the previous state-of-the-art counterparts by a large margin, especially under scenarios with severe lighting changes and low overlap. Our code and models are available at https://github.com/MrIsland/F2M_Reg.

F2M-Reg: Unsupervised RGB-D Point Cloud Registration with Frame-to-Model Optimization

TL;DR

F2M-Reg tackles unsupervised RGB-D point cloud registration by shifting supervision from frame-to-frame to frame-to-model. It leverages a neural implicit field as a global scene model to refine per-frame poses, providing robust supervision under lighting changes, occlusions, and low overlap. A two-stage pipeline combines synthetic warming-up with real-world frame-to-model optimization, using a two-branch registration network and a suite of rendering and geometric losses to train registration without pose annotations. The method achieves state-of-the-art performance across benchmarks (ScanNet, 3DMatch, ScanNet++, 7-Scenes), with notable gains in challenging settings, and demonstrates strong data scalability and potential for lifelong learning in 3D registration.

Abstract

This work studies the problem of unsupervised RGB-D point cloud registration, which aims at training a robust registration model without ground-truth pose supervision. Existing methods usually leverages unposed RGB-D sequences and adopt a frame-to-frame framework based on differentiable rendering to train the registration model, which enforces the photometric and geometric consistency between the two frames for supervision. However, this frame-to-frame framework is vulnerable to inconsistent factors between different frames, e.g., lighting changes, geometry occlusion, and reflective materials, which leads to suboptimal convergence of the registration model. In this paper, we propose a novel frame-to-model optimization framework named F2M-Reg for unsupervised RGB-D point cloud registration. We leverage the neural implicit field as a global model of the scene and optimize the estimated poses of the frames by registering them to the global model, and the registration model is subsequently trained with the optimized poses. Thanks to the global encoding capability of neural implicit field, our frame-to-model framework is significantly more robust to inconsistent factors between different frames and thus can provide better supervision for the registration model. Besides, we demonstrate that F2M-Reg can be further enhanced by a simplistic synthetic warming-up strategy. To this end, we construct a photorealistic synthetic dataset named Sim-RGBD to initialize the registration model for the frame-to-model optimization on real-world RGB-D sequences. Extensive experiments on four challenging benchmarks have shown that our method surpasses the previous state-of-the-art counterparts by a large margin, especially under scenarios with severe lighting changes and low overlap. Our code and models are available at https://github.com/MrIsland/F2M_Reg.
Paper Structure (16 sections, 9 equations, 7 figures, 10 tables)

This paper contains 16 sections, 9 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: We propose F2M-Reg, a frame-to-model optimization framework for unsupervised RGB-D registration. The registration is first warmed up with synthetic data, and then fine-tuned on real-world data in the frame-to-model manner (top). Although the frame-to-frame method can successfully register the easy case (bottom-left), it cannot register the case with lighting changes and low overlap (bottom-right). On the contrary, our method effectively register the hard case.
  • Figure 2: Overall pipeline of F2M-Reg. Our framework can be divided into two stages. The first synthetic warming-up stage leverages synthetic RGB-D pairs as well as their ground-truth poses to train the registration model in a supervised manner. In the second frame-to-model optimization stage, we take an RGB-D sequence as input and use the registration model to estimate the relative pose for every two consecutive frames. Based on the estimated poses, we jointly optimize a neural implicit field of the whole scene and the estimated poses. At last, the optimized poses are used to fine-tune the registration model on real-world data.
  • Figure 3: Mapping stage. The first frame in the current batch, Frame $i$, can either be the first of a new subsequence or the last from the previous batch, with its pose known. Once a new frame is tracked, it is added to the batch with its tracked pose $\mathbf{\tilde{T}}_{i}$. In Step 1, when the $(i+1)^\text{th}$ frame is added, its tracked pose $\mathbf{\tilde{T}}_{i+1}$ is optimized along with the mapped pose $\mathbf{\hat{T}}_{i}$ and the implicit scene representation $\theta$. In Step 2, after adding the $(i+2)^\text{th}$ frame, the optimization parameters become $\mathbf{\Psi} = \{\theta, \mathbf{\hat{T}}_{i}, \mathbf{\hat{T}}_{i+1}, \mathbf{\tilde{T}}_{i+2}\}$, with further frames tracked and optimized similarly.
  • Figure 4: Demonstration of Sim-RGBD dataset. The entire scene is depicted in the left figure. Camera sampling is illustrated in the middle figure. Initially, we sample the position of the first camera based on a specified pitch angle $\theta$ and yaw angle $\phi$, with (0, 0, 0) as the viewpoint, forming the camera's view direction. The position of the second camera is derived from the transformation of the first camera position, which is obtained from a Gaussian distribution. The right figure showcases the point cloud with color extracted from the scene.
  • Figure 5: Correspondences of PointMBF and F2M-Reg on ScanNet and 3DMatch. The first row shows that F2M-Reg outperforms when the input point clouds have a low overlap ratio. The subsequent rows illustrate that even under significant lighting changes, which adversely affect other methods, our approach continues to perform effectively.
  • ...and 2 more figures