Table of Contents
Fetching ...

LLM Granularity for On-the-Fly Robot Control

Peng Wang, Mattia Robbiani, Zhihao Guo

TL;DR

The paper addresses whether language alone can control assistive robots when visuals are unreliable by evaluating language prompt granularity and on-the-fly control. It presents an end-to-end linguomotor pipeline tested on a Sawyer cobot and a TurtleBot, using an LLM to interpret prompts and issue ROS actions without fine-tuning. Key findings show that quantitative prompts yield higher control accuracy and safer, more reliable behavior compared to qualitative prompts, with still some errors and occasional invalid actions from the LLM. The work establishes a foundational step toward linguomotor assistive robotics and highlights the trade offs between prompt granularity, responsiveness, and safety in dynamic assistance tasks.

Abstract

Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assistance. This raises the question: \textit{In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots, i.e., the viability of the `linguomotor` mode for assistive robots?} This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly. We have designed and conducted experiments on a Sawyer cobot to support our arguments. A Turtlebot robot case is designed to demonstrate the adaptation of the solution to scenarios where assistive robots need to maneuver to assist. Codes will be released on GitHub soon to benefit the community.

LLM Granularity for On-the-Fly Robot Control

TL;DR

The paper addresses whether language alone can control assistive robots when visuals are unreliable by evaluating language prompt granularity and on-the-fly control. It presents an end-to-end linguomotor pipeline tested on a Sawyer cobot and a TurtleBot, using an LLM to interpret prompts and issue ROS actions without fine-tuning. Key findings show that quantitative prompts yield higher control accuracy and safer, more reliable behavior compared to qualitative prompts, with still some errors and occasional invalid actions from the LLM. The work establishes a foundational step toward linguomotor assistive robotics and highlights the trade offs between prompt granularity, responsiveness, and safety in dynamic assistance tasks.

Abstract

Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assistance. This raises the question: \textit{In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots, i.e., the viability of the `linguomotor` mode for assistive robots?} This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly. We have designed and conducted experiments on a Sawyer cobot to support our arguments. A Turtlebot robot case is designed to demonstrate the adaptation of the solution to scenarios where assistive robots need to maneuver to assist. Codes will be released on GitHub soon to benefit the community.
Paper Structure (7 sections, 3 figures, 2 tables)

This paper contains 7 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The pipeline to test how language granularity affects robot control performance. Humans will act as the observer and the decision-maker. Humans will use language prompts of different granularity to control the robot and change the granularity based on observations.
  • Figure 2: The poses of the robots after executing the six prompts. From top left to bottom right: prompts ① to ⑥ in the first column of Table \ref{['tab:sayerdata']}, respectively.
  • Figure 3: The poses of the robots after executing language prompts. From left to right: initial states, move the arm to position_x = 0.46, position_y = 0.15, and position_z=0.5 while keeping the current orientation; move the arm up; rotate the arm, respectively.