Table of Contents
Fetching ...

MT-PCR: Hybrid Mamba-Transformer Network with Spatial Serialization for Point Cloud Registration

Bingxi Liu, An Liu, Hao Chen, Huaqi Tao, Jinqiang Cui, Yiqun Wang, Hong Zhang

Abstract

Point cloud registration (PCR) is a fundamental task in 3D computer vision and robotics. Most learning-based PCR methods rely on Transformer architectures, which suffer from quadratic computational complexity. This limitation restricts the resolution of point clouds that can be processed, inevitably leading to information loss. In contrast, Mamba, a recently proposed model based on state-space models, achieves linear computational complexity while maintaining strong long-range contextual modeling capabilities. However, directly applying Mamba to PCR tasks yields suboptimal performance due to the unordered and irregular nature of point cloud data. To address these challenges, we propose MT-PCR, the first point cloud registration framework that integrates Mamba and Transformer modules. Specifically, we serialize point cloud features using Z-order space-filling curves to enforce spatial locality, enabling Mamba to better model the geometric structure of the inputs. Additionally, we remove the order-indicator module commonly used in Mamba-based sequence modeling, leading to improved performance in our setting. The serialized features are then processed by an optimized Mamba encoder, followed by a Transformer-based feature refinement stage. Extensive experiments on multiple benchmarks demonstrate that MT-PCR outperforms Transformer-based and other state-of-the-art methods in both accuracy and efficiency, significantly reducing GPU memory usage and FLOPs.

MT-PCR: Hybrid Mamba-Transformer Network with Spatial Serialization for Point Cloud Registration

Abstract

Point cloud registration (PCR) is a fundamental task in 3D computer vision and robotics. Most learning-based PCR methods rely on Transformer architectures, which suffer from quadratic computational complexity. This limitation restricts the resolution of point clouds that can be processed, inevitably leading to information loss. In contrast, Mamba, a recently proposed model based on state-space models, achieves linear computational complexity while maintaining strong long-range contextual modeling capabilities. However, directly applying Mamba to PCR tasks yields suboptimal performance due to the unordered and irregular nature of point cloud data. To address these challenges, we propose MT-PCR, the first point cloud registration framework that integrates Mamba and Transformer modules. Specifically, we serialize point cloud features using Z-order space-filling curves to enforce spatial locality, enabling Mamba to better model the geometric structure of the inputs. Additionally, we remove the order-indicator module commonly used in Mamba-based sequence modeling, leading to improved performance in our setting. The serialized features are then processed by an optimized Mamba encoder, followed by a Transformer-based feature refinement stage. Extensive experiments on multiple benchmarks demonstrate that MT-PCR outperforms Transformer-based and other state-of-the-art methods in both accuracy and efficiency, significantly reducing GPU memory usage and FLOPs.

Paper Structure

This paper contains 20 sections, 22 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Registration recall and inference time comparison on 3DMatch. Our method, MT-PCR, achieves the best registration performance while maintaining competitive inference efficiency, outperforming recent state-of-the-art methods such as CAST (NIPS’25) and SGU-PCR (TIM'25).
  • Figure 2: FLOPs comparison under varying point token lengths. MT-PCR scales significantly better than GeoTransformer and CAST, maintaining low computational overhead even as the input size increases. Notably, GeoTransformer suffers from out-of-memory (OOM) issues at large resolutions, while MT-PCR remains efficient, achieving up to $6.9 \times$ lower FLOPs at 1536 tokens.
  • Figure 3: Overview of the MT-PCR Framework. The proposed pipeline consists of four stages: multi-scale feature extraction, coarse matching, sparse correspondence refinement, and fine registration. Notably, the coarse matching stage incorporates Mamba encoders with spatial serialization to model global geometric context efficiently.
  • Figure 4: Architecture of the Mamba Encoder and Block. The left diagram illustrates the Mamba Encoder with residual connections and feed-forward networks (FNNs). The right diagram shows the internal structure of a Mamba Block, which centers around the SelectiveSSM.
  • Figure 5: Qualitative registration results of CAST and MT-PCR compared with the ground truth alignment on 3DMatch dataset. We present four examples in four rows, which demonstrate the robustness and accuracy of our method.