Voice control interface for surgical robot assistants

Ana Davila; Jacinto Colan; Yasuhisa Hasegawa

Voice control interface for surgical robot assistants

Ana Davila, Jacinto Colan, Yasuhisa Hasegawa

TL;DR

This work presents a voice control interface for surgical robot assistants that leverages Whisper ASR within the ROS framework to translate spoken commands into real-time robotic actions. The system comprises a Speech Recognition Module, a Mapping Module that uses a lowest-WER strategy to select predefined commands, and a ROS-based robot-control layer, enabling safe and responsive manipulation under RCM constraints. Experimental validation on a 7-DoF Kinova setup demonstrates high command recognition accuracy and a mean latency of about 1.7 seconds, with a tissue triangulation demonstration illustrating practical feasibility. The modular architecture and use of a state-of-the-art ASR model point toward reduced cognitive load for surgeons and adaptable integration into diverse surgical workflows, with future enhancements possible through personalized training and more robust learning strategies.

Abstract

Traditional control interfaces for robotic-assisted minimally invasive surgery impose a significant cognitive load on surgeons. To improve surgical efficiency, surgeon-robot collaboration capabilities, and reduce surgeon burden, we present a novel voice control interface for surgical robotic assistants. Our system integrates Whisper, state-of-the-art speech recognition, within the ROS framework to enable real-time interpretation and execution of voice commands for surgical manipulator control. The proposed system consists of a speech recognition module, an action mapping module, and a robot control module. Experimental results demonstrate the system's high accuracy and inference speed, and demonstrates its feasibility for surgical applications in a tissue triangulation task. Future work will focus on further improving its robustness and clinical applicability.

Voice control interface for surgical robot assistants

TL;DR

Abstract

Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Introduction
Related works
Methodology
Speech recognition unit
Mapping module
Robot control
Experimental validation
Experimental setup
Speech recognition accuracy
Inference time
Demonstration in a tissue triangulation task
Conclusions

Figures (5)

Figure 1: Voice commands can be used for commanding robotic surgical assistants.
Figure 2: Overview of the proposed voice control.
Figure 3: Robotic surgical assistant. a. Robotic manipulator comprising a 7-DOF manipulator and a 3-DOF robotic surgical tool (RST). b. Kinematic description
Figure 4: Results for Speech command recognition
Figure 5: Snapshots of control of a robotic assistant for tissue manipulation.

Voice control interface for surgical robot assistants

TL;DR

Abstract

Voice control interface for surgical robot assistants

Authors

TL;DR

Abstract

Table of Contents

Figures (5)