Table of Contents
Fetching ...

SimCol3D -- 3D Reconstruction during Colonoscopy Challenge

Anita Rau, Sophia Bano, Yueming Jin, Pablo Azagra, Javier Morlana, Rawen Kader, Edward Sanderson, Bogdan J. Matuszewski, Jae Young Lee, Dong-Jae Lee, Erez Posner, Netanel Frank, Varshini Elangovan, Sista Raviteja, Zhengwen Li, Jiquan Liu, Seenivasan Lalithkumar, Mobarakol Islam, Hongliang Ren, Laurence B. Lovat, José M. M. Montiel, Danail Stoyanov

TL;DR

SimCol3D presents the first EndoVis benchmark for depth and 6D pose estimation in colonoscopy, combining synthetic and real sequences to study monocular 3D reconstruction. The challenge comprises three tasks—synthetic depth prediction, synthetic pose prediction, and real-pose prediction—with dedicated data, ground-truth annotations, and rigorous per-task metrics, including $L_1$, $L_{rel}$, $L_{RMSE}$ for depth and $ATE$, $RTE$, $ROT$ for pose. Across six depth-methods and three pose-methods from six teams, depth prediction on synthetic data achieved sub-millimeter accuracy on unseen anatomies (e.g., $<1\,\mathrm{mm}$), while pose prediction remained more challenging, showing drift and sensitivity to domain gaps between synthetic and real colonoscopic scenes. The study highlights the promise of synthetic data for depth estimation, the value of hybrid transformer-CNN architectures, and the need for better real-ground-truth data and domain-robust pose models to translate these gains into clinical practice.

Abstract

Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.

SimCol3D -- 3D Reconstruction during Colonoscopy Challenge

TL;DR

SimCol3D presents the first EndoVis benchmark for depth and 6D pose estimation in colonoscopy, combining synthetic and real sequences to study monocular 3D reconstruction. The challenge comprises three tasks—synthetic depth prediction, synthetic pose prediction, and real-pose prediction—with dedicated data, ground-truth annotations, and rigorous per-task metrics, including , , for depth and , , for pose. Across six depth-methods and three pose-methods from six teams, depth prediction on synthetic data achieved sub-millimeter accuracy on unseen anatomies (e.g., ), while pose prediction remained more challenging, showing drift and sensitivity to domain gaps between synthetic and real colonoscopic scenes. The study highlights the promise of synthetic data for depth estimation, the value of hybrid transformer-CNN architectures, and the need for better real-ground-truth data and domain-robust pose models to translate these gains into clinical practice.

Abstract

Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.
Paper Structure (29 sections, 12 equations, 7 figures, 6 tables)

This paper contains 29 sections, 12 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the real images (top), synthetic images (center), and synthetic depth maps (bottom) used in the challenge.
  • Figure 2: Synthetic Colons I, II, and III in Unity environment with camera paths along the center of the mesh. Synthetic Colons I and II include training and test trajectories. Synthetic Colon III provides test trajectories only.
  • Figure 3: Architecture overview for Task 1 (depth prediction) of the 6 participating teams. (I) Team CVML adapted FCBFormer sanderson2022fcn, (II) Team EndoAI utilized GLPDepth kim2022global with Segformer encoder xie2021segformer, (III) Team IntuitiveIL applied multiple DoG filters with varying scales as preprocessing and used a NeW CRF network for depth prediction, (IV) Team KLIV utilized SUMNet nandamuri2019sumnet, (V) Team MIVA utilized DenseDepth alhashim2018high as an encoder-decoder network with skip connections, (VI) Team MMLab utilized Swin-UNet cao2023swin.
  • Figure 4: Architecture overview for Task 2 (pose prediction from synthetic) and Task 3 (pose prediction from real) images of the participating teams. For Task 2, (I) Team EndoAI utilized MonoDepthv2 godard2019digging, (II) Team MIVA utilized SC-SfMLearner bian2021unsupervised, and (III) Team MMLab implemented curriculum learning with linear regression. For Task 3, (IV) Team EndoAI and Team MIVA utilized the CycleGAN model for Sim2Real image generation.
  • Figure 5: Comparison of depth predictions generated by the participant teams. For Synthetic Colons I and II we show one example from one test trajectory each. For Synthetic Colon III, we show an example for all three test trajectories. We show the average L1 error above each error map. The colorbar's scale is in cm. Visually, the results of CVML, EndoAI, IntuitiveIL, and MIVA are barely distinguishable from the ground truth. Though when observing the L1 error, CVML is found to be the best performing one, closely followed by MIVA.
  • ...and 2 more figures