Table of Contents
Fetching ...

LAHNet: Local Attentive Hashing Network for Point Cloud Registration

Wentao Qu, Xiaoshui Huang, Liang Xiao

TL;DR

LAHNet addresses the need for broader receptive fields in point cloud registration descriptors by introducing locality-biased local attention through a Group Transformer that uses locality-sensitive hashing to partition points into non-overlapping windows and a cross-window interaction mechanism. An additional Interaction Transformer leverages an overlap matrix to enhance matching of overlap regions between paired clouds. The approach achieves state-of-the-art results on 3DMatch, 3DLoMatch, and KITTI, validating its robustness in both high- and low-overlap and outdoor scenarios. Overall, the paper presents a scalable, efficient framework that effectively models long-range dependencies in 3D data for improved registration accuracy.

Abstract

Most existing learning-based point cloud descriptors for point cloud registration focus on perceiving local information of point clouds to generate distinctive features. However, a reasonable and broader receptive field is essential for enhancing feature distinctiveness. In this paper, we propose a Local Attentive Hashing Network for point cloud registration, called LAHNet, which introduces a local attention mechanism with the inductive bias of locality of convolution-like operators into point cloud descriptors. Specifically, a Group Transformer is designed to capture reasonable long-range context between points. This employs a linear neighborhood search strategy, Locality-Sensitive Hashing, enabling uniformly partitioning point clouds into non-overlapping windows. Meanwhile, an efficient cross-window strategy is adopted to further expand the reasonable feature receptive field. Furthermore, building on this effective windowing strategy, we propose an Interaction Transformer to enhance the feature interactions of the overlap regions within point cloud pairs. This computes an overlap matrix to match overlap regions between point cloud pairs by representing each window as a global signal. Extensive results demonstrate that LAHNet can learn robust and distinctive features, achieving significant registration results on real-world indoor and outdoor benchmarks.

LAHNet: Local Attentive Hashing Network for Point Cloud Registration

TL;DR

LAHNet addresses the need for broader receptive fields in point cloud registration descriptors by introducing locality-biased local attention through a Group Transformer that uses locality-sensitive hashing to partition points into non-overlapping windows and a cross-window interaction mechanism. An additional Interaction Transformer leverages an overlap matrix to enhance matching of overlap regions between paired clouds. The approach achieves state-of-the-art results on 3DMatch, 3DLoMatch, and KITTI, validating its robustness in both high- and low-overlap and outdoor scenarios. Overall, the paper presents a scalable, efficient framework that effectively models long-range dependencies in 3D data for improved registration accuracy.

Abstract

Most existing learning-based point cloud descriptors for point cloud registration focus on perceiving local information of point clouds to generate distinctive features. However, a reasonable and broader receptive field is essential for enhancing feature distinctiveness. In this paper, we propose a Local Attentive Hashing Network for point cloud registration, called LAHNet, which introduces a local attention mechanism with the inductive bias of locality of convolution-like operators into point cloud descriptors. Specifically, a Group Transformer is designed to capture reasonable long-range context between points. This employs a linear neighborhood search strategy, Locality-Sensitive Hashing, enabling uniformly partitioning point clouds into non-overlapping windows. Meanwhile, an efficient cross-window strategy is adopted to further expand the reasonable feature receptive field. Furthermore, building on this effective windowing strategy, we propose an Interaction Transformer to enhance the feature interactions of the overlap regions within point cloud pairs. This computes an overlap matrix to match overlap regions between point cloud pairs by representing each window as a global signal. Extensive results demonstrate that LAHNet can learn robust and distinctive features, achieving significant registration results on real-world indoor and outdoor benchmarks.

Paper Structure

This paper contains 20 sections, 8 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Based on the corresponding position on point cloud pairs, $\bm{p}$ corresponds to $\bm{q^1}$ (an inlier) correctly, while $\bm{p}$ corresponds to $\bm{q^2}$ (an outlier) incorrectly. (a) The limited receptive field results in very similar geometric representation between $\bm{q^1}$ and $\bm{q^2}$, may letting $\bm{p}$ incorrectly match $\bm{q^2}$ as an inlier. (b) The reasonable and expansive (non-global) receptive field enables features to more comprehensively perceive geometric information, making $\bm{q^1}$ and $\bm{q^2}$ express different geometric information. This significant feature distinctiveness enables $\bm{p}$ to correctly match $\bm{q^2}$ as an outlier.
  • Figure 2: The window partitioning process by LSH. $S$ is a random rotation matrix following standard Gaussian distribution. $\bm{F}$ is uniformly partitioned into non-overlapping windows based on the hash values of $\bm{P}$ from $m$ bins.
  • Figure 3: Voxelization vs. KNN vs. Octree vs. LSH. (a) Voxelization generates a large number of empty voxels, leading to additional computational costs. (b) KNN introduces quadratic complexity, while challenging to handle for the last window. (c) Octree constructs a z-order curve to uniformly partition the point cloud wang2023octformer. (d) LSH can easily partition windows based on hash values.
  • Figure 4: The overall architecture of LAHNet. The source point cloud and target point cloud are as inputs, and the output includes corresponding features, $\bm{F_p}$ and $\bm{F_q}$. First, two consecutive Group Transformers in the U-Net encoder are applied to model reasonable long-range dependencies between points. Next, a Interaction Transformer is placed at the U-Net bottleneck stage for feature interaction of overlap regions. Finally, the U-Net decoder focuses on feature scale restoration and aggregates multi-scale information.
  • Figure 5: The process of enhancing feature receptive field through cross-window interaction. (a), (b), and (c) all represent the same point cloud partitioned into $4\times4$ windows. The points (yellow) in (a) can indirectly perceive the information of points (purple and blue) in (c) through points (green) in (b).
  • ...and 3 more figures