Table of Contents
Fetching ...

Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration

Haihua Shi, Qianliang Wu

TL;DR

This work addresses the bottleneck of explicit, robust correspondence learning for point cloud registration, proposing a diffusion-based framework that operates directly in the space of non-square doubly stochastic matrices to iteratively refine matching correspondences. By integrating a lightweight denoising module (Sinkhorn projection, weighted Procrustes, a transformer, and a matching head) with a KPConv backbone, the method learns a reverse diffusion gradient that guides the search toward the target matching matrix, improving robustness to partial overlap and symmetry ambiguities. Evaluations on rigid (3DMatch/3DLoMatch) and non-rigid (4DMatch/4DLoMatch) datasets show competitive or superior performance in both correspondence quality and registration accuracy, with ablations highlighting the benefits of iterative reverse sampling, deterministic DDIM-like steps, and the ability to start from various initializations. The approach demonstrates that diffusion-informed optimization of correspondences can yield faster convergence and improved performance, while remaining lightweight enough for practical use and adaptable to both 2D-2D, 2D-3D, and 3D-3D registration tasks.

Abstract

Efficiently finding optimal correspondences between point clouds is crucial for solving both rigid and non-rigid point cloud registration problems. Existing methods often rely on geometric or semantic feature embedding to establish correspondences and estimate transformations or flow fields. Recently, state-of-the-art methods have employed RAFT-like iterative updates to refine the solution. However, these methods have certain limitations. Firstly, their iterative refinement design lacks transparency, and their iterative updates follow a fixed path during the refinement process, which can lead to suboptimal results. Secondly, these methods overlook the importance of refining or optimizing correspondences (or matching matrices) as a precursor to solving transformations or flow fields. They typically compute candidate correspondences based on distances in the point feature space. However, they only project the candidate matching matrix into some matrix space once with Sinkhorn or dual softmax operations to obtain final correspondences. This one-shot projected matching matrix may be far from the globally optimal one, and these approaches do not consider the distribution of the target matching matrix. In this paper, we propose a novel approach that exploits the Denoising Diffusion Model to predict a searching gradient for the optimal matching matrix within the Doubly Stochastic Matrix Space. During the reverse denoising process, our method iteratively searches for better solutions along this denoising gradient, which points towards the maximum likelihood direction of the target matching matrix. Our method offers flexibility by allowing the search to start from any initial matching matrix provided by the online backbone or white noise. Experimental evaluations on the 3DMatch/3DLoMatch and 4DMatch/4DLoMatch datasets demonstrate the effectiveness of our newly designed framework.

Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration

TL;DR

This work addresses the bottleneck of explicit, robust correspondence learning for point cloud registration, proposing a diffusion-based framework that operates directly in the space of non-square doubly stochastic matrices to iteratively refine matching correspondences. By integrating a lightweight denoising module (Sinkhorn projection, weighted Procrustes, a transformer, and a matching head) with a KPConv backbone, the method learns a reverse diffusion gradient that guides the search toward the target matching matrix, improving robustness to partial overlap and symmetry ambiguities. Evaluations on rigid (3DMatch/3DLoMatch) and non-rigid (4DMatch/4DLoMatch) datasets show competitive or superior performance in both correspondence quality and registration accuracy, with ablations highlighting the benefits of iterative reverse sampling, deterministic DDIM-like steps, and the ability to start from various initializations. The approach demonstrates that diffusion-informed optimization of correspondences can yield faster convergence and improved performance, while remaining lightweight enough for practical use and adaptable to both 2D-2D, 2D-3D, and 3D-3D registration tasks.

Abstract

Efficiently finding optimal correspondences between point clouds is crucial for solving both rigid and non-rigid point cloud registration problems. Existing methods often rely on geometric or semantic feature embedding to establish correspondences and estimate transformations or flow fields. Recently, state-of-the-art methods have employed RAFT-like iterative updates to refine the solution. However, these methods have certain limitations. Firstly, their iterative refinement design lacks transparency, and their iterative updates follow a fixed path during the refinement process, which can lead to suboptimal results. Secondly, these methods overlook the importance of refining or optimizing correspondences (or matching matrices) as a precursor to solving transformations or flow fields. They typically compute candidate correspondences based on distances in the point feature space. However, they only project the candidate matching matrix into some matrix space once with Sinkhorn or dual softmax operations to obtain final correspondences. This one-shot projected matching matrix may be far from the globally optimal one, and these approaches do not consider the distribution of the target matching matrix. In this paper, we propose a novel approach that exploits the Denoising Diffusion Model to predict a searching gradient for the optimal matching matrix within the Doubly Stochastic Matrix Space. During the reverse denoising process, our method iteratively searches for better solutions along this denoising gradient, which points towards the maximum likelihood direction of the target matching matrix. Our method offers flexibility by allowing the search to start from any initial matching matrix provided by the online backbone or white noise. Experimental evaluations on the 3DMatch/3DLoMatch and 4DMatch/4DLoMatch datasets demonstrate the effectiveness of our newly designed framework.
Paper Structure (34 sections, 25 equations, 7 figures, 7 tables, 3 algorithms)

This paper contains 34 sections, 25 equations, 7 figures, 7 tables, 3 algorithms.

Figures (7)

  • Figure 1: The reverse sampling process for matching matrix on the doubly stochastic matrix manifolds. Zoom in for details.
  • Figure 2: Overview of our matching matrix diffusion model. $\bigoplus$ mean 3D point coordinates and position encoding are both utilized as input. $\bigotimes$ means only 3D point utilized. The input points $\hat{P},\hat{Q}$ along with their corresponding point features $F^{\hat{P}}, F^{\hat{Q}}$ remain fixed throughout the entire reverse sampling process after being output from the KPConv backbone. These inputs undergo transformation through the warping operation and denoising transformer at each denoising step. The forward diffusion process is modeled by the Gaussian transition kernel $q(E^t|E^{t-1})$ which has a closed form $q(E^t|E^0)$. The denoising model $g_\theta(E^t)$ learns a reverse denoising gradient that points to the target solution $E^0$. During inference in the reverse sampling process, we utilize the predicted $\hat{E}_0$ and Eqn. (\ref{['close_diff']},\ref{['ddim_sampling']}) to sampling $E^{t-1}$.
  • Figure 3: Overview of our framework training. Our framework includes a KPConvthomas2019kpconv featuren backbone optimization and a denoising module optimization. The training detail of the denoising module is listed in Algorithm.\ref{['training-diff-pcr']}. We implement the denoising matching loss $L_{simple}$ by utilizing a focal loss. After the model has been trained, only the KPConvthomas2019kpconv backbone and denoising module remain for use in the reverse sampling process, while the other components are discarded.
  • Figure 4: The qualitative results of rigid registration in the 3DMatch/3DLoMatch benchmark. Zoom in for details.
  • Figure 5: The qualitative results of deformable matching in the 4DMatch/4DLoMatch benchmark. The top results are generated by Lepardli2022lepard. The bottom results are from our method. The red/green denotes two directions matching errors. Zoom in for details.
  • ...and 2 more figures