SimCol3D -- 3D Reconstruction during Colonoscopy Challenge
Anita Rau, Sophia Bano, Yueming Jin, Pablo Azagra, Javier Morlana, Rawen Kader, Edward Sanderson, Bogdan J. Matuszewski, Jae Young Lee, Dong-Jae Lee, Erez Posner, Netanel Frank, Varshini Elangovan, Sista Raviteja, Zhengwen Li, Jiquan Liu, Seenivasan Lalithkumar, Mobarakol Islam, Hongliang Ren, Laurence B. Lovat, José M. M. Montiel, Danail Stoyanov
TL;DR
SimCol3D presents the first EndoVis benchmark for depth and 6D pose estimation in colonoscopy, combining synthetic and real sequences to study monocular 3D reconstruction. The challenge comprises three tasks—synthetic depth prediction, synthetic pose prediction, and real-pose prediction—with dedicated data, ground-truth annotations, and rigorous per-task metrics, including $L_1$, $L_{rel}$, $L_{RMSE}$ for depth and $ATE$, $RTE$, $ROT$ for pose. Across six depth-methods and three pose-methods from six teams, depth prediction on synthetic data achieved sub-millimeter accuracy on unseen anatomies (e.g., $<1\,\mathrm{mm}$), while pose prediction remained more challenging, showing drift and sensitivity to domain gaps between synthetic and real colonoscopic scenes. The study highlights the promise of synthetic data for depth estimation, the value of hybrid transformer-CNN architectures, and the need for better real-ground-truth data and domain-robust pose models to translate these gains into clinical practice.
Abstract
Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.
