Table of Contents
Fetching ...

Diff-Reg v1: Diffusion Matching Model for Registration Problem

Qianliang Wu, Haobo Jiang, Lei Luo, Jun Li, Yaqing Ding, Jin Xie, Jian Yang

TL;DR

Diff-Reg introduces a diffusion-based framework for constructing correspondences in registration by operating in the doubly stochastic matrix space $\\mathcal{M}$. A forward diffusion $q(\\mathbf{E}^t|\\mathbf{E}^0)$ corrupts the ground-truth matching $\\mathbf{E}^0$, while a lightweight denoising module $g_\\theta$ performs reverse sampling to recover $\\mathbf{E}^0$ using a posterior-guided path (via DDIM). Training relies on a variational lower bound with a simple $L_{simple}$ objective, while inference updates occur with minimal backbone reuse to speed up sampling. Experiments on 3D non-rigid and 2D-3D registration (e.g., 4DMatch/4DLoMatch and RGB-D Scenes V2) show substantial gains over single-pass baselines and demonstrate robustness to large deformations and scale ambiguities. The method offers data-efficient diffusion-based augmentation in the matching matrix space and yields accurate, deformable correspondences with improved registration recall and inlier metrics, while remaining computationally efficient due to the lightweight denoiser.

Abstract

Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.

Diff-Reg v1: Diffusion Matching Model for Registration Problem

TL;DR

Diff-Reg introduces a diffusion-based framework for constructing correspondences in registration by operating in the doubly stochastic matrix space . A forward diffusion corrupts the ground-truth matching , while a lightweight denoising module performs reverse sampling to recover using a posterior-guided path (via DDIM). Training relies on a variational lower bound with a simple objective, while inference updates occur with minimal backbone reuse to speed up sampling. Experiments on 3D non-rigid and 2D-3D registration (e.g., 4DMatch/4DLoMatch and RGB-D Scenes V2) show substantial gains over single-pass baselines and demonstrate robustness to large deformations and scale ambiguities. The method offers data-efficient diffusion-based augmentation in the matching matrix space and yields accurate, deformable correspondences with improved registration recall and inlier metrics, while remaining computationally efficient due to the lightweight denoiser.

Abstract

Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.
Paper Structure (15 sections, 8 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 8 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of our diffusion matching model. The forward diffusion process is driven by the Gaussian transition kernel $q(\mathbf{E}^t|\mathbf{E}^{t-1})$, which has a closed form $q(\mathbf{E}^t|\mathbf{E}^0)$. The denoising model $g_\theta(\mathbf{E}^t)$ learns a reverse denoising gradient that points to the target solution $\mathbf{E}^0$. During inference, in the reverse sampling process, we utilize the predicted $\hat{\mathbf{E}}_0$ and DDIM song2020denoising to sampling $\mathbf{E}^{t-1}$.
  • Figure 2: The qualitative results of non-rigid registration in the 4DMatch/4DLoMatch benchmark. The top two lines are from 4DMatch, while the bottom three are from 4DLoMatch. The blue and yellow colors denote the source and target point cloud, respectively. The green and red lines indicate whether the predicted deformable flow from the source points is accepted by the threshold. The deformable registration is built by GraphSCNet li20232d3d. Zoom in for details.
  • Figure 3: The qualitative results of top-200 predicted correspondences on the RGB-D Scenes V2 benchmark lai2014unsupervised. The green/red color indicates whether the matching score is accepted based on a threshold value. Zoom in for details.