Table of Contents
Fetching ...

PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations

Rui She, Sijie Wang, Qiyu Kang, Kai Zhao, Yang Song, Wee Peng Tay, Tianyu Geng, Xingchao Jian

TL;DR

PosDiffNet tackles robust point cloud registration in large fields of view with perturbations by marrying Beltrami-flow-based neural diffusion for joint feature and position embedding with a neural ODE–driven Transformer. The method employs a hierarchical window–patch–point matching pipeline and refines correspondences with LGR-based transformation estimation, achieving state-of-the-art results on Boreas and KITTI under challenging weather and noise. Key contributions include a Beltrami diffusion-based 3D representation module, a diffusion-based Transformer for feature-position fusion, and a hierarchical matching strategy that collectively improve robustness and efficiency. The work has practical implications for outdoor 3D perception tasks in autonomous systems and geospatial applications, though it notes the cost of diffusion and attention as a limitation for resource-constrained devices.

Abstract

Point cloud registration is a crucial technique in 3D computer vision with a wide range of applications. However, this task can be challenging, particularly in large fields of view with dynamic objects, environmental noise, or other perturbations. To address this challenge, we propose a model called PosDiffNet. Our approach performs hierarchical registration based on window-level, patch-level, and point-level correspondence. We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features and position embeddings for point clouds. We incorporate position embeddings into a Transformer module based on a neural ordinary differential equation (ODE) to efficiently represent patches within points. We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds. Subsequently, we use registration methods such as SVD-based algorithms to predict the transformation using corresponding point pairs. We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations. The implementation code of experiments is available at https://github.com/AI-IT-AVs/PosDiffNet.

PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations

TL;DR

PosDiffNet tackles robust point cloud registration in large fields of view with perturbations by marrying Beltrami-flow-based neural diffusion for joint feature and position embedding with a neural ODE–driven Transformer. The method employs a hierarchical window–patch–point matching pipeline and refines correspondences with LGR-based transformation estimation, achieving state-of-the-art results on Boreas and KITTI under challenging weather and noise. Key contributions include a Beltrami diffusion-based 3D representation module, a diffusion-based Transformer for feature-position fusion, and a hierarchical matching strategy that collectively improve robustness and efficiency. The work has practical implications for outdoor 3D perception tasks in autonomous systems and geospatial applications, though it notes the cost of diffusion and attention as a limitation for resource-constrained devices.

Abstract

Point cloud registration is a crucial technique in 3D computer vision with a wide range of applications. However, this task can be challenging, particularly in large fields of view with dynamic objects, environmental noise, or other perturbations. To address this challenge, we propose a model called PosDiffNet. Our approach performs hierarchical registration based on window-level, patch-level, and point-level correspondence. We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features and position embeddings for point clouds. We incorporate position embeddings into a Transformer module based on a neural ordinary differential equation (ODE) to efficiently represent patches within points. We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds. Subsequently, we use registration methods such as SVD-based algorithms to predict the transformation using corresponding point pairs. We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations. The implementation code of experiments is available at https://github.com/AI-IT-AVs/PosDiffNet.
Paper Structure (27 sections, 18 equations, 10 figures, 15 tables)

This paper contains 27 sections, 18 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: The architecture of our PosDiffNet for the registration task point cloud pairs. Detailed information about the modules can be found in the subsequent subsections of Methodology.
  • Figure 2: Architecture of the Beltrami neural diffusion module for feature and position embeddings.
  • Figure 3: Architecture of the feature-position Transformer based on neural ODE.
  • Figure 4: Kernel density estimate plots and box plots for the normalized feature distance between noisy and clean conditions for the modules with or without Beltrami diffusion. The additive noises include two Gaussian noises following $\calN(0, \sigma = 0.25)$ and $\calN(0, \sigma = 1.5)$, corresponding to the low-level and high-level noises.
  • Figure 5: Kernel density estimate plots and box plots for the normalized feature distance between noisy and clean conditions for the Transformer with or without neural ODE.
  • ...and 5 more figures

Theorems & Definitions (2)

  • proof
  • proof