Table of Contents
Fetching ...

Learning Spatial Awareness for Laparoscopic Surgery with AI Assisted Visual Feedback

Songyang Liu, Yunpeng Tan, Shuai Li

TL;DR

The paper addresses depth perception limitations in monocular, 2D laparoscopic training by introducing an AI-assisted dual-view framework that fuses standard 2D endoscopy with synchronized 3D visualizations in NVIDIA Isaac Sim, delivered through mixed-reality. Real-time AI modules perform tool localization and instrument-tissue interaction detection to generate corrective 3D cues via an MR headset, without changing the clinical 2D feed. Key contributions include a heatmap-based tool localization module, ITID-Net-based interaction detection, and an Action Graph-driven feedback pipeline spanning navigation, manipulation, transfer, cutting, and suturing, demonstrating that 3D context disambiguates visually similar 2D scenes and enhances depth perception. The simulated results suggest potential improvements in depth reasoning, occlusion handling, and adherence to planned trajectories, offering a scalable path to faster, safer MIS skill development.

Abstract

Laparoscopic surgery constrains surgeons spatial awareness because procedures are performed through a monocular, two-dimensional (2D) endoscopic view. Conventional training methods using dry-lab models or recorded videos provide limited depth cues, often leading trainees to misjudge instrument position and perform ineffective or unsafe maneuvers. To address this limitation, we present an AI-assisted training framework developed in NVIDIA Isaac Sim that couples the standard 2D laparoscopic feed with synchronized three-dimensional (3D) visual feedback delivered through a mixed-reality (MR) interface. While trainees operate using the clinical 2D view, validated AI modules continuously localize surgical instruments and detect instrument-tissue interactions in the background. When spatial misjudgments are detected, 3D visual feedback are displayed to trainees, while preserving the original operative perspective. Our framework considers various surgical tasks including navigation, manipulation, transfer, cutting, and suturing. Visually similar 2D cases can be disambiguated through the added 3D context, improving depth perception, contact awareness, and tool orientation understanding.

Learning Spatial Awareness for Laparoscopic Surgery with AI Assisted Visual Feedback

TL;DR

The paper addresses depth perception limitations in monocular, 2D laparoscopic training by introducing an AI-assisted dual-view framework that fuses standard 2D endoscopy with synchronized 3D visualizations in NVIDIA Isaac Sim, delivered through mixed-reality. Real-time AI modules perform tool localization and instrument-tissue interaction detection to generate corrective 3D cues via an MR headset, without changing the clinical 2D feed. Key contributions include a heatmap-based tool localization module, ITID-Net-based interaction detection, and an Action Graph-driven feedback pipeline spanning navigation, manipulation, transfer, cutting, and suturing, demonstrating that 3D context disambiguates visually similar 2D scenes and enhances depth perception. The simulated results suggest potential improvements in depth reasoning, occlusion handling, and adherence to planned trajectories, offering a scalable path to faster, safer MIS skill development.

Abstract

Laparoscopic surgery constrains surgeons spatial awareness because procedures are performed through a monocular, two-dimensional (2D) endoscopic view. Conventional training methods using dry-lab models or recorded videos provide limited depth cues, often leading trainees to misjudge instrument position and perform ineffective or unsafe maneuvers. To address this limitation, we present an AI-assisted training framework developed in NVIDIA Isaac Sim that couples the standard 2D laparoscopic feed with synchronized three-dimensional (3D) visual feedback delivered through a mixed-reality (MR) interface. While trainees operate using the clinical 2D view, validated AI modules continuously localize surgical instruments and detect instrument-tissue interactions in the background. When spatial misjudgments are detected, 3D visual feedback are displayed to trainees, while preserving the original operative perspective. Our framework considers various surgical tasks including navigation, manipulation, transfer, cutting, and suturing. Visually similar 2D cases can be disambiguated through the added 3D context, improving depth perception, contact awareness, and tool orientation understanding.

Paper Structure

This paper contains 14 sections, 8 figures.

Figures (8)

  • Figure 1: High-fidelity virtual surgical environment used in our simulation. The left panel shows a photorealistic surgical operation room with daVinci research kit Patient Side Manipulators (PSM), while the right panel presents detailed 3D models of human anatomical organs designed for realistic surgical training.
  • Figure 2: Overview of our proposed AI-assisted surgical training pipeline. The pipeline consists of three integrated modules. (1) Surgical Tool Localization (green box): Endoscopic video frames are resized and processed through an Hourglass convolutional network to generate heatmaps indicating tool tip locations. (2) Instrument-Tissue Interaction Detection (red box): A detection network identifies bounding boxes and classes of instruments and tissues, then predicts their interactions using temporal and spatial reasoning. (3) AI-Assisted Surgeon Training with Mixed Reality (blue box): A surgeon operates within a simulated environment while viewing standard 2D laparoscopic video. Real-time AI analysis provides 3D feedback via a mixed-reality headset, highlighting correct and incorrect tool-tissue interactions to improve spatial awareness and surgical precision.
  • Figure 3: Action graph-based real-time visual feedback mechanism implemented in NVIDIA Isaac Sim. The trainee controls daVinci medical robotic arm (Trainee Control), which drives the end-effector toward the target while visualizing its trajectory. When a contact event is detected (On Contact), the system provides multiple visual feedback channels: color-coded material updates (Write Prim Material), screen-space text cues (Display ‘Correct’ or ‘Wrong’), and textual logging, to improve spatial awareness and guide correct surgical operations.
  • Figure 4: Trainee mixed-reality (MR) views during the laparoscopic training process implemented in NVIDIA Isaac Sim. (a) Surgical room environment setup with controller options displayed. (b) MR configuration interface for environment initialization and parameter adjustment. (c) Trainee navigation within the MR scene using handheld controllers; the blue line visualizes the navigation trajectory. (d) Object interaction and selection using controllers; the selected organ is highlighted with yellow dashed contours.
  • Figure 5: Multi-view visualization during surgical task. (a) Standard photorealistic rendering showing the daVinci robotic arm holding a needle. (b) Semantic segmentation view illustrating class-level pixel labeling for instruments and organs. (c) 2D bounding box visualization for key objects, including the robotic arm, needle, and anatomical structures. (d) 3D bounding box representation highlighting object geometry and spatial relationships within the simulation environment.
  • ...and 3 more figures