Table of Contents
Fetching ...

Voice control interface for surgical robot assistants

Ana Davila, Jacinto Colan, Yasuhisa Hasegawa

TL;DR

This work presents a voice control interface for surgical robot assistants that leverages Whisper ASR within the ROS framework to translate spoken commands into real-time robotic actions. The system comprises a Speech Recognition Module, a Mapping Module that uses a lowest-WER strategy to select predefined commands, and a ROS-based robot-control layer, enabling safe and responsive manipulation under RCM constraints. Experimental validation on a 7-DoF Kinova setup demonstrates high command recognition accuracy and a mean latency of about 1.7 seconds, with a tissue triangulation demonstration illustrating practical feasibility. The modular architecture and use of a state-of-the-art ASR model point toward reduced cognitive load for surgeons and adaptable integration into diverse surgical workflows, with future enhancements possible through personalized training and more robust learning strategies.

Abstract

Traditional control interfaces for robotic-assisted minimally invasive surgery impose a significant cognitive load on surgeons. To improve surgical efficiency, surgeon-robot collaboration capabilities, and reduce surgeon burden, we present a novel voice control interface for surgical robotic assistants. Our system integrates Whisper, state-of-the-art speech recognition, within the ROS framework to enable real-time interpretation and execution of voice commands for surgical manipulator control. The proposed system consists of a speech recognition module, an action mapping module, and a robot control module. Experimental results demonstrate the system's high accuracy and inference speed, and demonstrates its feasibility for surgical applications in a tissue triangulation task. Future work will focus on further improving its robustness and clinical applicability.

Voice control interface for surgical robot assistants

TL;DR

This work presents a voice control interface for surgical robot assistants that leverages Whisper ASR within the ROS framework to translate spoken commands into real-time robotic actions. The system comprises a Speech Recognition Module, a Mapping Module that uses a lowest-WER strategy to select predefined commands, and a ROS-based robot-control layer, enabling safe and responsive manipulation under RCM constraints. Experimental validation on a 7-DoF Kinova setup demonstrates high command recognition accuracy and a mean latency of about 1.7 seconds, with a tissue triangulation demonstration illustrating practical feasibility. The modular architecture and use of a state-of-the-art ASR model point toward reduced cognitive load for surgeons and adaptable integration into diverse surgical workflows, with future enhancements possible through personalized training and more robust learning strategies.

Abstract

Traditional control interfaces for robotic-assisted minimally invasive surgery impose a significant cognitive load on surgeons. To improve surgical efficiency, surgeon-robot collaboration capabilities, and reduce surgeon burden, we present a novel voice control interface for surgical robotic assistants. Our system integrates Whisper, state-of-the-art speech recognition, within the ROS framework to enable real-time interpretation and execution of voice commands for surgical manipulator control. The proposed system consists of a speech recognition module, an action mapping module, and a robot control module. Experimental results demonstrate the system's high accuracy and inference speed, and demonstrates its feasibility for surgical applications in a tissue triangulation task. Future work will focus on further improving its robustness and clinical applicability.
Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Voice commands can be used for commanding robotic surgical assistants.
  • Figure 2: Overview of the proposed voice control.
  • Figure 3: Robotic surgical assistant. a. Robotic manipulator comprising a 7-DOF manipulator and a 3-DOF robotic surgical tool (RST). b. Kinematic description
  • Figure 4: Results for Speech command recognition
  • Figure 5: Snapshots of control of a robotic assistant for tissue manipulation.