LLM Granularity for On-the-Fly Robot Control
Peng Wang, Mattia Robbiani, Zhihao Guo
TL;DR
The paper addresses whether language alone can control assistive robots when visuals are unreliable by evaluating language prompt granularity and on-the-fly control. It presents an end-to-end linguomotor pipeline tested on a Sawyer cobot and a TurtleBot, using an LLM to interpret prompts and issue ROS actions without fine-tuning. Key findings show that quantitative prompts yield higher control accuracy and safer, more reliable behavior compared to qualitative prompts, with still some errors and occasional invalid actions from the LLM. The work establishes a foundational step toward linguomotor assistive robotics and highlights the trade offs between prompt granularity, responsiveness, and safety in dynamic assistance tasks.
Abstract
Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assistance. This raises the question: \textit{In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots, i.e., the viability of the `linguomotor` mode for assistive robots?} This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly. We have designed and conducted experiments on a Sawyer cobot to support our arguments. A Turtlebot robot case is designed to demonstrate the adaptation of the solution to scenarios where assistive robots need to maneuver to assist. Codes will be released on GitHub soon to benefit the community.
