SuperPoint-E: local features for 3D reconstruction via tracking adaptation in endoscopy
O. Leon Barbed, José M. M. Montiel, Pascal Fua, Ana C. Murillo
TL;DR
This work introduces SuperPoint-E (SP-E), an endoscopy-tailored local feature extractor trained with Tracking Adaptation, a supervision strategy derived from Structure-from-Motion (SfM) reconstructions. By leveraging reliable tracks from COLMAP as ground-truth correspondences, SP-E learns detectors and descriptors that are more repeatable and discriminative in endoscopic imagery, improving downstream SfM density and coverage. The approach demonstrates superior performance over SIFT and standard SuperPoint across multiple endoscopic datasets, with SP-E showing robustness to specularities and a reduced need for complex matching pipelines. These advances enable denser 3D reconstructions and longer sequence coverage, potentially extending to SLAM and mixed-reality applications in endoscopy and related modalities.
Abstract
In this work, we focus on boosting the feature extraction to improve the performance of Structure-from-Motion (SfM) in endoscopy videos. We present SuperPoint-E, a new local feature extraction method that, using our proposed Tracking Adaptation supervision strategy, significantly improves the quality of feature detection and description in endoscopy. Extensive experimentation on real endoscopy recordings studies our approach's most suitable configuration and evaluates SuperPoint-E feature quality. The comparison with other baselines also shows that our 3D reconstructions are denser and cover more and longer video segments because our detector fires more densely and our features are more likely to survive (i.e. higher detection precision). In addition, our descriptor is more discriminative, making the guided matching step almost redundant. The presented approach brings significant improvements in the 3D reconstructions obtained, via SfM on endoscopy videos, compared to the original SuperPoint and the gold standard SfM COLMAP pipeline.
