MambaNetLK: Enhancing Colonoscopy Point Cloud Registration with Mamba
Linzhe Jiang, Jiayuan Huang, Sophia Bano, Matthew J. Clarkson, Zhehua Mao, Mobarak I. Hoque
TL;DR
This work tackles the challenge of cross-modal, real-time 3D point cloud registration in image-guided colonoscopy by introducing MambaNetLK, a correspondence-free registration framework that leverages a global Mamba State Space Model encoder and an inverse-compositional Lucas–Kanade alignment. It is complemented by the C3VD-Raycasting-10k dataset, a large-scale clinical benchmark of 10,014 geometrically aligned point-cloud pairs generated from CT data via ray casting, enabling standardized evaluation of partial-to-partial alignment. Empirical results show state-of-the-art performance on the clinical dataset, strong generalization to ModelNet40, and robustness to substantial initial pose perturbations, with significant reductions in rotation and translation errors compared to baselines. Together, these contributions provide a robust foundation for accurate, reliable guidance in minimally invasive procedures like colonoscopy and advance cross-modal 3D registration toward clinically deployable navigation systems.
Abstract
Accurate 3D point cloud registration underpins reliable image-guided colonoscopy, directly affecting lesion localization, margin assessment, and navigation safety. However, biological tissue exhibits repetitive textures and locally homogeneous geometry that cause feature degeneracy, while substantial domain shifts between pre-operative anatomy and intra-operative observations further degrade alignment stability. To address these clinically critical challenges, we introduce a novel 3D registration method tailored for endoscopic navigation and a high-quality, clinically grounded dataset to support rigorous and reproducible benchmarking. We introduce C3VD-Raycasting-10k, a large-scale benchmark dataset with 10,014 geometrically aligned point cloud pairs derived from clinical CT data. We propose MambaNetLK, a novel correspondence-free registration framework, which enhances the PointNetLK architecture by integrating a Mamba State Space Model (SSM) as a cross-modal feature extractor. As a result, the proposed framework efficiently captures long-range dependencies with linear-time complexity. The alignment is achieved iteratively using the Lucas-Kanade algorithm. On the clinical dataset, C3VD-Raycasting-10k, MambaNetLK achieves the best performance compared with the state-of-the-art methods, reducing median rotation error by 56.04% and RMSE translation error by 26.19% over the second-best method. The model also demonstrates strong generalization on ModelNet40 and superior robustness to initial pose perturbations. MambaNetLK provides a robust foundation for 3D registration in surgical navigation. The combination of a globally expressive SSM-based feature extractor and a large-scale clinical dataset enables more accurate and reliable guidance systems in minimally invasive procedures like colonoscopy.
