Table of Contents
Fetching ...

MambaNetLK: Enhancing Colonoscopy Point Cloud Registration with Mamba

Linzhe Jiang, Jiayuan Huang, Sophia Bano, Matthew J. Clarkson, Zhehua Mao, Mobarak I. Hoque

TL;DR

This work tackles the challenge of cross-modal, real-time 3D point cloud registration in image-guided colonoscopy by introducing MambaNetLK, a correspondence-free registration framework that leverages a global Mamba State Space Model encoder and an inverse-compositional Lucas–Kanade alignment. It is complemented by the C3VD-Raycasting-10k dataset, a large-scale clinical benchmark of 10,014 geometrically aligned point-cloud pairs generated from CT data via ray casting, enabling standardized evaluation of partial-to-partial alignment. Empirical results show state-of-the-art performance on the clinical dataset, strong generalization to ModelNet40, and robustness to substantial initial pose perturbations, with significant reductions in rotation and translation errors compared to baselines. Together, these contributions provide a robust foundation for accurate, reliable guidance in minimally invasive procedures like colonoscopy and advance cross-modal 3D registration toward clinically deployable navigation systems.

Abstract

Accurate 3D point cloud registration underpins reliable image-guided colonoscopy, directly affecting lesion localization, margin assessment, and navigation safety. However, biological tissue exhibits repetitive textures and locally homogeneous geometry that cause feature degeneracy, while substantial domain shifts between pre-operative anatomy and intra-operative observations further degrade alignment stability. To address these clinically critical challenges, we introduce a novel 3D registration method tailored for endoscopic navigation and a high-quality, clinically grounded dataset to support rigorous and reproducible benchmarking. We introduce C3VD-Raycasting-10k, a large-scale benchmark dataset with 10,014 geometrically aligned point cloud pairs derived from clinical CT data. We propose MambaNetLK, a novel correspondence-free registration framework, which enhances the PointNetLK architecture by integrating a Mamba State Space Model (SSM) as a cross-modal feature extractor. As a result, the proposed framework efficiently captures long-range dependencies with linear-time complexity. The alignment is achieved iteratively using the Lucas-Kanade algorithm. On the clinical dataset, C3VD-Raycasting-10k, MambaNetLK achieves the best performance compared with the state-of-the-art methods, reducing median rotation error by 56.04% and RMSE translation error by 26.19% over the second-best method. The model also demonstrates strong generalization on ModelNet40 and superior robustness to initial pose perturbations. MambaNetLK provides a robust foundation for 3D registration in surgical navigation. The combination of a globally expressive SSM-based feature extractor and a large-scale clinical dataset enables more accurate and reliable guidance systems in minimally invasive procedures like colonoscopy.

MambaNetLK: Enhancing Colonoscopy Point Cloud Registration with Mamba

TL;DR

This work tackles the challenge of cross-modal, real-time 3D point cloud registration in image-guided colonoscopy by introducing MambaNetLK, a correspondence-free registration framework that leverages a global Mamba State Space Model encoder and an inverse-compositional Lucas–Kanade alignment. It is complemented by the C3VD-Raycasting-10k dataset, a large-scale clinical benchmark of 10,014 geometrically aligned point-cloud pairs generated from CT data via ray casting, enabling standardized evaluation of partial-to-partial alignment. Empirical results show state-of-the-art performance on the clinical dataset, strong generalization to ModelNet40, and robustness to substantial initial pose perturbations, with significant reductions in rotation and translation errors compared to baselines. Together, these contributions provide a robust foundation for accurate, reliable guidance in minimally invasive procedures like colonoscopy and advance cross-modal 3D registration toward clinically deployable navigation systems.

Abstract

Accurate 3D point cloud registration underpins reliable image-guided colonoscopy, directly affecting lesion localization, margin assessment, and navigation safety. However, biological tissue exhibits repetitive textures and locally homogeneous geometry that cause feature degeneracy, while substantial domain shifts between pre-operative anatomy and intra-operative observations further degrade alignment stability. To address these clinically critical challenges, we introduce a novel 3D registration method tailored for endoscopic navigation and a high-quality, clinically grounded dataset to support rigorous and reproducible benchmarking. We introduce C3VD-Raycasting-10k, a large-scale benchmark dataset with 10,014 geometrically aligned point cloud pairs derived from clinical CT data. We propose MambaNetLK, a novel correspondence-free registration framework, which enhances the PointNetLK architecture by integrating a Mamba State Space Model (SSM) as a cross-modal feature extractor. As a result, the proposed framework efficiently captures long-range dependencies with linear-time complexity. The alignment is achieved iteratively using the Lucas-Kanade algorithm. On the clinical dataset, C3VD-Raycasting-10k, MambaNetLK achieves the best performance compared with the state-of-the-art methods, reducing median rotation error by 56.04% and RMSE translation error by 26.19% over the second-best method. The model also demonstrates strong generalization on ModelNet40 and superior robustness to initial pose perturbations. MambaNetLK provides a robust foundation for 3D registration in surgical navigation. The combination of a globally expressive SSM-based feature extractor and a large-scale clinical dataset enables more accurate and reliable guidance systems in minimally invasive procedures like colonoscopy.

Paper Structure

This paper contains 13 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An overview of the MambaNetLK framework. The blue arrow indicates a one-time pre-computation: the Jacobian Solver uses the target's feature vector $\phi(P_T)$ to generate the Jacobian $J$. The brown arrows depict the iterative loop: the Transformations Calculator uses the feature residual between $\phi(P_T)$ and $\phi(P_S)$ and the pre-computed Jacobian $J$ to solve for an incremental transformation, which repeatedly updates the source point cloud's pose until convergence.
  • Figure 2: Frame-wise visible point extraction workflow. Using camera poses as the key linkage, the pipeline generates geometrically aligned point cloud pairs: (left) ray-casting extracts visible surfaces from the CT mesh to produce the target point cloud, while (right) depth map reprojection from video frames produces the source point cloud, ensuring both share identical viewpoints.
  • Figure 3: Performance comparison under initial rotational perturbations from 0$^{\circ}$ to 90$^{\circ}$. The plots show (a) average rotation error on ModelNet40, (b) average rotation error on C3VD-Raycasting-10k, (c) average translation error on ModelNet40, and (d) average translation error on C3VD-Raycasting-10k.
  • Figure 4: Qualitative comparison of registration results on the C3VD-Raycasting-10k dataset. (a) Initial perturbed state. (b) ICP and (c) DCP fail to converge correctly. (d) PointNetLK shows partial alignment. (e) PointNetLK Revisited demonstrates catastrophic failure. (f) MambaNetLK achieves near-perfect alignment.