Table of Contents
Fetching ...

Setup-Invariant Augmented Reality for Teaching by Demonstration with Surgical Robots

Alexandre Banks, Richard Cook, Septimiu E. Salcudean

TL;DR

The paper tackles the lack of setup-invariant AR guidance for robot-assisted surgery training by introducing dV-STEAR, an open-source system that records expert motions relative to a training task and replays them in any robot configuration. It achieves this through markerless scene registration, end-effector pose estimation, back-projected rendering of articulated ghost instruments, and API- corrected kinematics using a Kabsch-Umeyama-based calibration, along with dual-quaternion hand-eye calibration. In a 24-participant study across two fundamental tasks, dV-STEAR improved novice performance (e.g., faster task completion, higher success rates) and reduced frustration and mental workload, while maintaining pose accuracy around $3.86\pm2.01$ mm. The work demonstrates the viability of asynchronous expert demonstration playback in AR-RAST, enabling self-directed training outside the OR and reducing reliance on direct supervision, with open-source access for broader adoption and development.

Abstract

Augmented reality (AR) is an effective tool in robotic surgery education as it combines exploratory learning with three-dimensional guidance. However, existing AR systems require expert supervision and do not account for differences in the mentor and mentee robot configurations. To enable novices to train outside the operating room while receiving expert-informed guidance, we present dV-STEAR: an open-source system that plays back task-aligned expert demonstrations without assuming identical setup joint positions between expert and novice. Pose estimation was rigorously quantified, showing a registration error of 3.86 (SD=2.01)mm. In a user study (N=24), dV-STEAR significantly improved novice performance on tasks from the Fundamentals of Laparoscopic Surgery. In a single-handed ring-over-wire task, dV-STEAR increased completion speed (p=0.03) and reduced collision time (p=0.01) compared to dry-lab training alone. During a pick-and-place task, it improved success rates (p=0.004). Across both tasks, participants using dV-STEAR exhibited significantly more balanced hand use and reported lower frustration levels. This work presents a novel educational tool implemented on the da Vinci Research Kit, demonstrates its effectiveness in teaching novices, and builds the foundation for further AR integration into robot-assisted surgery.

Setup-Invariant Augmented Reality for Teaching by Demonstration with Surgical Robots

TL;DR

The paper tackles the lack of setup-invariant AR guidance for robot-assisted surgery training by introducing dV-STEAR, an open-source system that records expert motions relative to a training task and replays them in any robot configuration. It achieves this through markerless scene registration, end-effector pose estimation, back-projected rendering of articulated ghost instruments, and API- corrected kinematics using a Kabsch-Umeyama-based calibration, along with dual-quaternion hand-eye calibration. In a 24-participant study across two fundamental tasks, dV-STEAR improved novice performance (e.g., faster task completion, higher success rates) and reduced frustration and mental workload, while maintaining pose accuracy around mm. The work demonstrates the viability of asynchronous expert demonstration playback in AR-RAST, enabling self-directed training outside the OR and reducing reliance on direct supervision, with open-source access for broader adoption and development.

Abstract

Augmented reality (AR) is an effective tool in robotic surgery education as it combines exploratory learning with three-dimensional guidance. However, existing AR systems require expert supervision and do not account for differences in the mentor and mentee robot configurations. To enable novices to train outside the operating room while receiving expert-informed guidance, we present dV-STEAR: an open-source system that plays back task-aligned expert demonstrations without assuming identical setup joint positions between expert and novice. Pose estimation was rigorously quantified, showing a registration error of 3.86 (SD=2.01)mm. In a user study (N=24), dV-STEAR significantly improved novice performance on tasks from the Fundamentals of Laparoscopic Surgery. In a single-handed ring-over-wire task, dV-STEAR increased completion speed (p=0.03) and reduced collision time (p=0.01) compared to dry-lab training alone. During a pick-and-place task, it improved success rates (p=0.004). Across both tasks, participants using dV-STEAR exhibited significantly more balanced hand use and reported lower frustration levels. This work presents a novel educational tool implemented on the da Vinci Research Kit, demonstrates its effectiveness in teaching novices, and builds the foundation for further AR integration into robot-assisted surgery.

Paper Structure

This paper contains 16 sections, 4 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of dV-STEAR for setup-invariant record and playback of AR tools in RAS training. Expert surgeon motions are recorded as PSM end-effector pose relative to the training task, ${}^{s}\boldsymbol{T}_{psm}$, and the instrument's final three joint parameters, $q_6,q_7,\theta_J$. These are played back to the novice at a later time. The view during AR rendering corresponds to the left/right ECM camera poses, ${}^{lc}\boldsymbol{T}_{s},{}^{rc}\boldsymbol{T}_{s}$, relative to the training task.
  • Figure 2: dV-STEAR UI, showing the four components of the articulated instruments: left jaw, right jaw, body, and shaft.
  • Figure 3: Overview of the transforms to estimate PSM pose during expert surgeon motion recording. ${}^{lc,i}\boldsymbol{T}_{s}$ is the registration from the initial camera pose to the dry-lab surgical scene. ${}^{ecm}\boldsymbol{T}_{lc}$ and ${}^{ecm,i}\boldsymbol{T}_{ecm}$ are the hand-eye and ECM motion transforms, respectively. ${}^{ecm}\boldsymbol{T}_{psm}$ is the uncorrected transform from the ECM to the PSM given by the dVRK API. ${}^{s}\boldsymbol{T}_{psm}$ is the resulting transform recorded when the surgeon completes tasks.
  • Figure 4: Pick-and-Place dry-lab surgical task with mounts for ArUco markers. Depicted are visible 3D corner positions in the object's coordinate frame $\{\underline{\boldsymbol{C}}{s},\boldsymbol{o}_{s}\}$. The registration algorithm computes the ${}^{c,i}\boldsymbol{T}_{s}$ transform under occlusion so long as one ArUco is visible.
  • Figure 5: a) Transforms during playback of the AR instruments. ${}^{lc,i}\boldsymbol{T}_{s}$ and ${}^{rc,i}\boldsymbol{T}_{s}$ are registrations from the initial left/right camera frames to the dry-lab surgical scene. ${}^{ecm}\boldsymbol{T}_{lc}$ and ${}^{ecm}\boldsymbol{T}_{rc}$ are the rigid hand-eye transforms. ECM motion is given by ${}^{ecm,i}\boldsymbol{T}_{ecm}$. The transform ${}^{s}\boldsymbol{T}_{psm}$ in purple is the pose of the PSM recorded from the expert surgeon and overlayed as an AR tool; b) Transformations used to move backwards through the PSM's kinematic chain and place the four rigid AR components: left jaw ($J_L,loc$), right jaw ($J_R,loc$), body ($b,loc$), and shaft ($sh,loc$).
  • ...and 5 more figures