Table of Contents
Fetching ...

From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction

Ayberk Acar, Mariana Smith, Lidia Al-Zogbi, Tanner Watts, Fangjie Li, Hao Li, Nural Yilmaz, Paul Maria Scheikl, Jesse F. d'Almeida, Susheela Sharma, Lauren Branscombe, Tayfun Efe Ertop, Robert J. Webster, Ipek Oguz, Alan Kuntz, Axel Krieger, Jie Ying Wu

TL;DR

This work demonstrates that a monocular RGB-based 3D reconstruction pipeline can provide effective scene understanding to guide autonomous central airway obstruction tumor resections. By systematically evaluating six SfM pipelines on ex vivo CAO data, selecting a dense and robust reconstruction method, and generating segmented point clouds via self-supervised labeling, the authors enable downstream robotic resection without RGB-D data. The approach yields high segmentation precision, accurate fiducial-based registration ($L_2$ norm metrics), and comparable or improved tissue-sparing outcomes compared with RGB-D guidance, despite longer in-scene processing times. These results suggest monocular vision can support autonomous surgical guidance in space-constrained endoscopic workflows and motivate future real-time integration and broader clinical validation.

Abstract

Surgical automation requires precise guidance and understanding of the scene. Current methods in the literature rely on bulky depth cameras to create maps of the anatomy, however this does not translate well to space-limited clinical applications. Monocular cameras are small and allow minimally invasive surgeries in tight spaces but additional processing is required to generate 3D scene understanding. We propose a 3D mapping pipeline that uses only RGB images to create segmented point clouds of the target anatomy. To ensure the most precise reconstruction, we compare different structure from motion algorithms' performance on mapping the central airway obstructions, and test the pipeline on a downstream task of tumor resection. In several metrics, including post-procedure tissue model evaluation, our pipeline performs comparably to RGB-D cameras and, in some cases, even surpasses their performance. These promising results demonstrate that automation guidance can be achieved in minimally invasive procedures with monocular cameras. This study is a step toward the complete autonomy of surgical robots.

From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction

TL;DR

This work demonstrates that a monocular RGB-based 3D reconstruction pipeline can provide effective scene understanding to guide autonomous central airway obstruction tumor resections. By systematically evaluating six SfM pipelines on ex vivo CAO data, selecting a dense and robust reconstruction method, and generating segmented point clouds via self-supervised labeling, the authors enable downstream robotic resection without RGB-D data. The approach yields high segmentation precision, accurate fiducial-based registration ( norm metrics), and comparable or improved tissue-sparing outcomes compared with RGB-D guidance, despite longer in-scene processing times. These results suggest monocular vision can support autonomous surgical guidance in space-constrained endoscopic workflows and motivate future real-time integration and broader clinical validation.

Abstract

Surgical automation requires precise guidance and understanding of the scene. Current methods in the literature rely on bulky depth cameras to create maps of the anatomy, however this does not translate well to space-limited clinical applications. Monocular cameras are small and allow minimally invasive surgeries in tight spaces but additional processing is required to generate 3D scene understanding. We propose a 3D mapping pipeline that uses only RGB images to create segmented point clouds of the target anatomy. To ensure the most precise reconstruction, we compare different structure from motion algorithms' performance on mapping the central airway obstructions, and test the pipeline on a downstream task of tumor resection. In several metrics, including post-procedure tissue model evaluation, our pipeline performs comparably to RGB-D cameras and, in some cases, even surpasses their performance. These promising results demonstrate that automation guidance can be achieved in minimally invasive procedures with monocular cameras. This study is a step toward the complete autonomy of surgical robots.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Robotic electrocautery tool positioned on an open-surgery CAO tissue model (a), corresponding RViz visualization of the segmented point cloud generated by SfM reconstruction and registration (b). Snapshots taken during the robotic CAO resection of this tissue model using the SfM (c).
  • Figure 2: Example images (left) and corresponding reconstructions (right) from the CAO dataset. The upper row shows unsegmented reconstructions, while the lower row shows the same model with segmentation applied.
  • Figure 3: Full robotic system for downstream CAO resection experiments.
  • Figure 4: Example images from resection experiment setup (left) and resulting segmented point cloud (right).
  • Figure 5: Qualitative comparison of six structure from motion pipelines. (a) SuperPoint detone2018superpoint + SuperGlue sarlin2020superglue, (b) SuperPoint + LightGlue lindenberger2023lightglue, (c) SuperPointInLoc taira2018inloc + LightGlue, (d) ALIKED zhao2023aliked + LightGlue, (e) SIFT NN lowe1999object, (f) DISK tyszkiewicz2020disk + LightGlue
  • ...and 3 more figures