Diff-Reg v1: Diffusion Matching Model for Registration Problem
Qianliang Wu, Haobo Jiang, Lei Luo, Jun Li, Yaqing Ding, Jin Xie, Jian Yang
TL;DR
Diff-Reg introduces a diffusion-based framework for constructing correspondences in registration by operating in the doubly stochastic matrix space $\\mathcal{M}$. A forward diffusion $q(\\mathbf{E}^t|\\mathbf{E}^0)$ corrupts the ground-truth matching $\\mathbf{E}^0$, while a lightweight denoising module $g_\\theta$ performs reverse sampling to recover $\\mathbf{E}^0$ using a posterior-guided path (via DDIM). Training relies on a variational lower bound with a simple $L_{simple}$ objective, while inference updates occur with minimal backbone reuse to speed up sampling. Experiments on 3D non-rigid and 2D-3D registration (e.g., 4DMatch/4DLoMatch and RGB-D Scenes V2) show substantial gains over single-pass baselines and demonstrate robustness to large deformations and scale ambiguities. The method offers data-efficient diffusion-based augmentation in the matching matrix space and yields accurate, deformable correspondences with improved registration recall and inlier metrics, while remaining computationally efficient due to the lightweight denoiser.
Abstract
Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.
