Topological SLAM in colonoscopies leveraging deep features and topological priors
Javier Morlana, Juan D. Tardós, José M. M. Montiel
TL;DR
ColonSLAM addresses the challenge of mapping the entire colon in colonoscopy by fusing metric submaps into a topological graph. It introduces a deep visual place recognition-based localization network and a transformer-based matcher, both guided by topological priors, to link covisible submaps across time and distance. The method yields richer topological maps than previous approaches and demonstrates robust performance on real Endomapper data, with high precision and solid recall when combined with topological priors and LightGlue. This enables potential personalized patient maps and improved navigation or monitoring in colonoscopy, with code and models publicly available.
Abstract
We introduce ColonSLAM, a system that combines classical multiple-map metric SLAM with deep features and topological priors to create topological maps of the whole colon. The SLAM pipeline by itself is able to create disconnected individual metric submaps representing locations from short video subsections of the colon, but is not able to merge covisible submaps due to deformations and the limited performance of the SIFT descriptor in the medical domain. ColonSLAM is guided by topological priors and combines a deep localization network trained to distinguish if two images come from the same place or not and the soft verification of a transformer-based matching network, being able to relate far-in-time submaps during an exploration, grouping them in nodes imaging the same colon place, building more complex maps than any other approach in the literature. We demonstrate our approach in the Endomapper dataset, showing its potential for producing maps of the whole colon in real human explorations. Code and models are available at: https://github.com/endomapper/ColonSLAM.
