CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy

Richard Elvira; Juan D. Tardós; José M. M. Montiel

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy

Richard Elvira, Juan D. Tardós, José M. M. Montiel

TL;DR

CudaSIFT-SLAM addresses the challenging problem of real-time monocular V-SLAM inside human colonoscopies by replacing ORB-based place recognition with GPU-accelerated SIFT features and brute-force matching to enable reliable multi-map merging and relocalization. It builds an atlas of maps, uses Sim($3$) alignment for cross-map verification, and introduces an affine deformation-noise model to handle quasi-rigid tissue movements. Experimental results on the C3VD phantom and Endomapper real-colonoscopy datasets show significantly improved coverage and longer maps with multiple merges and relocations, outperforming ORB-SLAM3, and achieving real-time performance on GPU hardware. The work demonstrates the potential for autonomous navigation, augmented reality, and improved screening analysis inside the colon by enabling robust, real-time, multi-map SLAM in deformable endoscopic environments, with avenues for cross-procedure map reuse and deeper metric-topologic mapping.

Abstract

Monocular visual simultaneous localization and mapping (V-SLAM) is nowadays an irreplaceable tool in mobile robotics and augmented reality, where it performs robustly. However, human colonoscopies pose formidable challenges like occlusions, blur, light changes, lack of texture, deformation, water jets or tool interaction, which result in very frequent tracking losses. ORB-SLAM3, the top performing multiple-map V-SLAM, is unable to recover from them by merging sub-maps or relocalizing the camera, due to the poor performance of its place recognition algorithm based on ORB features and DBoW2 bag-of-words. We present CudaSIFT-SLAM, the first V-SLAM system able to process complete human colonoscopies in real-time. To overcome the limitations of ORB-SLAM3, we use SIFT instead of ORB features and replace the DBoW2 direct index with the more computationally demanding brute-force matching, being able to successfully match images separated in time for relocation and map merging. Real-time performance is achieved thanks to CudaSIFT, a GPU implementation for SIFT extraction and brute-force matching. We benchmark our system in the C3VD phantom colon dataset, and in a full real colonoscopy from the Endomapper dataset, demonstrating the capabilities to merge sub-maps and relocate in them, obtaining significantly longer sub-maps. Our system successfully maps in real-time 88 % of the frames in the C3VD dataset. In a real screening colonoscopy, despite the much higher prevalence of occluded and blurred frames, the mapping coverage is 53 % in carefully explored areas and 38 % in the full sequence, a 70 % improvement over ORB-SLAM3.

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy

TL;DR

) alignment for cross-map verification, and introduces an affine deformation-noise model to handle quasi-rigid tissue movements. Experimental results on the C3VD phantom and Endomapper real-colonoscopy datasets show significantly improved coverage and longer maps with multiple merges and relocations, outperforming ORB-SLAM3, and achieving real-time performance on GPU hardware. The work demonstrates the potential for autonomous navigation, augmented reality, and improved screening analysis inside the colon by enabling robust, real-time, multi-map SLAM in deformable endoscopic environments, with avenues for cross-procedure map reuse and deeper metric-topologic mapping.

Abstract

Paper Structure (17 sections, 2 equations, 10 figures, 6 tables)

This paper contains 17 sections, 2 equations, 10 figures, 6 tables.

Introduction
Related Work
System overview
CudaSIFT features in colonoscopy frames
Place recognition and map merging in colonoscopy
Relocalization in colonoscopy
Map Initialization
Quasi-rigid deformation model
Experiments
C3VD Phantom dataset
Short sequences
Screening sequences
Computing time
Endomapper Dataset
Cecum segment
...and 2 more sections

Figures (10)

Figure 1: Typical images from a real colonoscopy: (a) ideal clean frame, (b) Narrow Band Imaging (NBI), (c) collapsed section, (d) debris, (e) water cleaning the lens, (f) water drops on the lens, (g) water jet cleaning mucosa, (h) motion blur and (i) tool interacting with mucosa.
Figure 2: CudaSIFT-SLAM three thread structure, based on ORB-SLAM3 campos2021orb
Figure 3: Removal of features extracted on reflections and image borders.
Figure 4: CudaSIFT vs. ORB place recognition comparison. First row: inliers after RANSAC between the query KF $\hbox{K}_a$ and three covisible KFs. Second row: 3D-3D final matches after guided matching. The KFs used by CudaSIFT and ORB are different because they correspond to different executions. The selection of KFs to compare is based on similarity with $\hbox{K}_a$.
Figure 5: Verification of points distribution for map initialization. Green: reprojection of map points at initialization. Blue: fitted ellipse.
...and 5 more figures

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy

TL;DR

Abstract

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy

Authors

TL;DR

Abstract

Table of Contents

Figures (10)