Table of Contents
Fetching ...

Topological SLAM in colonoscopies leveraging deep features and topological priors

Javier Morlana, Juan D. Tardós, José M. M. Montiel

TL;DR

ColonSLAM addresses the challenge of mapping the entire colon in colonoscopy by fusing metric submaps into a topological graph. It introduces a deep visual place recognition-based localization network and a transformer-based matcher, both guided by topological priors, to link covisible submaps across time and distance. The method yields richer topological maps than previous approaches and demonstrates robust performance on real Endomapper data, with high precision and solid recall when combined with topological priors and LightGlue. This enables potential personalized patient maps and improved navigation or monitoring in colonoscopy, with code and models publicly available.

Abstract

We introduce ColonSLAM, a system that combines classical multiple-map metric SLAM with deep features and topological priors to create topological maps of the whole colon. The SLAM pipeline by itself is able to create disconnected individual metric submaps representing locations from short video subsections of the colon, but is not able to merge covisible submaps due to deformations and the limited performance of the SIFT descriptor in the medical domain. ColonSLAM is guided by topological priors and combines a deep localization network trained to distinguish if two images come from the same place or not and the soft verification of a transformer-based matching network, being able to relate far-in-time submaps during an exploration, grouping them in nodes imaging the same colon place, building more complex maps than any other approach in the literature. We demonstrate our approach in the Endomapper dataset, showing its potential for producing maps of the whole colon in real human explorations. Code and models are available at: https://github.com/endomapper/ColonSLAM.

Topological SLAM in colonoscopies leveraging deep features and topological priors

TL;DR

ColonSLAM addresses the challenge of mapping the entire colon in colonoscopy by fusing metric submaps into a topological graph. It introduces a deep visual place recognition-based localization network and a transformer-based matcher, both guided by topological priors, to link covisible submaps across time and distance. The method yields richer topological maps than previous approaches and demonstrates robust performance on real Endomapper data, with high precision and solid recall when combined with topological priors and LightGlue. This enables potential personalized patient maps and improved navigation or monitoring in colonoscopy, with code and models publicly available.

Abstract

We introduce ColonSLAM, a system that combines classical multiple-map metric SLAM with deep features and topological priors to create topological maps of the whole colon. The SLAM pipeline by itself is able to create disconnected individual metric submaps representing locations from short video subsections of the colon, but is not able to merge covisible submaps due to deformations and the limited performance of the SIFT descriptor in the medical domain. ColonSLAM is guided by topological priors and combines a deep localization network trained to distinguish if two images come from the same place or not and the soft verification of a transformer-based matching network, being able to relate far-in-time submaps during an exploration, grouping them in nodes imaging the same colon place, building more complex maps than any other approach in the literature. We demonstrate our approach in the Endomapper dataset, showing its potential for producing maps of the whole colon in real human explorations. Code and models are available at: https://github.com/endomapper/ColonSLAM.
Paper Structure (12 sections, 2 equations, 3 figures, 1 table)

This paper contains 12 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: ColonSLAM. From a linear graph of metric submaps, ColonSLAM is able to obtain a topological graph with rich connections by leveraging a novel localization network, topological priors and LightGlue matching.
  • Figure 2: Localization network $\mathbb{L}$. It obtains a $sim$ score, deciding if two images come from the same place. The backbone green blocks and the MLP are fine-tuned.
  • Figure 3: Seq_027 topological map. CudaSIFT-SLAM (left) and ColonSLAM (right).