Learning Spatial Awareness for Laparoscopic Surgery with AI Assisted Visual Feedback
Songyang Liu, Yunpeng Tan, Shuai Li
TL;DR
The paper addresses depth perception limitations in monocular, 2D laparoscopic training by introducing an AI-assisted dual-view framework that fuses standard 2D endoscopy with synchronized 3D visualizations in NVIDIA Isaac Sim, delivered through mixed-reality. Real-time AI modules perform tool localization and instrument-tissue interaction detection to generate corrective 3D cues via an MR headset, without changing the clinical 2D feed. Key contributions include a heatmap-based tool localization module, ITID-Net-based interaction detection, and an Action Graph-driven feedback pipeline spanning navigation, manipulation, transfer, cutting, and suturing, demonstrating that 3D context disambiguates visually similar 2D scenes and enhances depth perception. The simulated results suggest potential improvements in depth reasoning, occlusion handling, and adherence to planned trajectories, offering a scalable path to faster, safer MIS skill development.
Abstract
Laparoscopic surgery constrains surgeons spatial awareness because procedures are performed through a monocular, two-dimensional (2D) endoscopic view. Conventional training methods using dry-lab models or recorded videos provide limited depth cues, often leading trainees to misjudge instrument position and perform ineffective or unsafe maneuvers. To address this limitation, we present an AI-assisted training framework developed in NVIDIA Isaac Sim that couples the standard 2D laparoscopic feed with synchronized three-dimensional (3D) visual feedback delivered through a mixed-reality (MR) interface. While trainees operate using the clinical 2D view, validated AI modules continuously localize surgical instruments and detect instrument-tissue interactions in the background. When spatial misjudgments are detected, 3D visual feedback are displayed to trainees, while preserving the original operative perspective. Our framework considers various surgical tasks including navigation, manipulation, transfer, cutting, and suturing. Visually similar 2D cases can be disambiguated through the added 3D context, improving depth perception, contact awareness, and tool orientation understanding.
