Voice control interface for surgical robot assistants
Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
TL;DR
This work presents a voice control interface for surgical robot assistants that leverages Whisper ASR within the ROS framework to translate spoken commands into real-time robotic actions. The system comprises a Speech Recognition Module, a Mapping Module that uses a lowest-WER strategy to select predefined commands, and a ROS-based robot-control layer, enabling safe and responsive manipulation under RCM constraints. Experimental validation on a 7-DoF Kinova setup demonstrates high command recognition accuracy and a mean latency of about 1.7 seconds, with a tissue triangulation demonstration illustrating practical feasibility. The modular architecture and use of a state-of-the-art ASR model point toward reduced cognitive load for surgeons and adaptable integration into diverse surgical workflows, with future enhancements possible through personalized training and more robust learning strategies.
Abstract
Traditional control interfaces for robotic-assisted minimally invasive surgery impose a significant cognitive load on surgeons. To improve surgical efficiency, surgeon-robot collaboration capabilities, and reduce surgeon burden, we present a novel voice control interface for surgical robotic assistants. Our system integrates Whisper, state-of-the-art speech recognition, within the ROS framework to enable real-time interpretation and execution of voice commands for surgical manipulator control. The proposed system consists of a speech recognition module, an action mapping module, and a robot control module. Experimental results demonstrate the system's high accuracy and inference speed, and demonstrates its feasibility for surgical applications in a tissue triangulation task. Future work will focus on further improving its robustness and clinical applicability.
