Table of Contents
Fetching ...

LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation

Yiran Tao, Jehan Yang, Dan Ding, Zackory Erickson

TL;DR

LAMS addresses the challenge of frequent mode switching in assistive teleoperation by using an LLM to map low-DoF joystick inputs to high-DoF robot actions based on language-grounded task context, without task demonstrations. It incrementally improves by incorporating user-generated mode-switching examples into its prompts, and is validated through ablation and a user study with 10 participants on complex, long-horizon tasks, showing fewer manual switches and strong user preference. The approach leverages a three-part input to the LLM (prefix, rules, pose grounding) and a probability-based decoding strategy to select robust mappings, with rule synthesis performed by a separate LLM from user examples. Results indicate LAMS generalizes across tasks better than heuristic or static methods, reduces cognitive load, and learns from user interaction over time, indicating practical value for assistive robotics and teleoperation. However, challenges remain in differentiating certain rotational actions and in grounding 3D orientation in natural language, motivating future exploration of cross-task rule transfer and more nuanced NL descriptions.

Abstract

Teleoperating high degrees-of-freedom (DoF) robotic manipulators via low-DoF controllers like joysticks often requires frequent switching between control modes, where each mode maps controller movements to specific robot actions. Manually performing this frequent switching can make teleoperation cumbersome and inefficient. On the other hand, existing automatic mode-switching solutions, such as heuristic-based or learning-based methods, are often task-specific and lack generalizability. In this paper, we introduce LLM-Driven Automatic Mode Switching (LAMS), a novel approach that leverages Large Language Models (LLMs) to automatically switch control modes based on task context. Unlike existing methods, LAMS requires no prior task demonstrations and incrementally improves by integrating user-generated mode-switching examples. We validate LAMS through an ablation study and a user study with 10 participants on complex, long-horizon tasks, demonstrating that LAMS effectively reduces manual mode switches, is preferred over alternative methods, and improves performance over time. The project website with supplementary materials is at https://lams-assistance.github.io/.

LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation

TL;DR

LAMS addresses the challenge of frequent mode switching in assistive teleoperation by using an LLM to map low-DoF joystick inputs to high-DoF robot actions based on language-grounded task context, without task demonstrations. It incrementally improves by incorporating user-generated mode-switching examples into its prompts, and is validated through ablation and a user study with 10 participants on complex, long-horizon tasks, showing fewer manual switches and strong user preference. The approach leverages a three-part input to the LLM (prefix, rules, pose grounding) and a probability-based decoding strategy to select robust mappings, with rule synthesis performed by a separate LLM from user examples. Results indicate LAMS generalizes across tasks better than heuristic or static methods, reduces cognitive load, and learns from user interaction over time, indicating practical value for assistive robotics and teleoperation. However, challenges remain in differentiating certain rotational actions and in grounding 3D orientation in natural language, motivating future exploration of cross-task rule transfer and more nuanced NL descriptions.

Abstract

Teleoperating high degrees-of-freedom (DoF) robotic manipulators via low-DoF controllers like joysticks often requires frequent switching between control modes, where each mode maps controller movements to specific robot actions. Manually performing this frequent switching can make teleoperation cumbersome and inefficient. On the other hand, existing automatic mode-switching solutions, such as heuristic-based or learning-based methods, are often task-specific and lack generalizability. In this paper, we introduce LLM-Driven Automatic Mode Switching (LAMS), a novel approach that leverages Large Language Models (LLMs) to automatically switch control modes based on task context. Unlike existing methods, LAMS requires no prior task demonstrations and incrementally improves by integrating user-generated mode-switching examples. We validate LAMS through an ablation study and a user study with 10 participants on complex, long-horizon tasks, demonstrating that LAMS effectively reduces manual mode switches, is preferred over alternative methods, and improves performance over time. The project website with supplementary materials is at https://lams-assistance.github.io/.
Paper Structure (25 sections, 3 equations, 12 figures)

This paper contains 25 sections, 3 equations, 12 figures.

Figures (12)

  • Figure 1: We introduce LLM-Driven Automatic Mode Switching (LAMS), which uses Large Language Models (LLMs) to automatically predict the most effective mapping between joystick and robot movement directions. LAMS requires no prior task demonstrations and incrementally improves as the user repeatedly interacts with the system. Top: In the initial trials, while able to provide useful mapping predictions, LAMS encounters some errors due to limited task knowledge, requiring users to occasionally perform manual mode switches. Bottom: By the third trials, with LLM prompts enhanced by integrating prior user manual switches, LAMS performs automatic mode switches accurately with minimal user intervention.
  • Figure 2: Our proposed LLM-Driven Automatic Mode Switching (LAMS) framework. LAMS grounds the current robot end effector and task object poses into a natural language description $l_{pose}^{t}$. This description, along with a prompt prefix $l_{pre}$ and a rule prompt $l_{rule}^{t}$, forms a natural language instruction $l_{t}$, which is fed into an LLM to generate the mode $\mathcal{M}_{t}$, i.e., the mapping of the joystick’s four movement directions to specific robot action directions. $\mathcal{M}_{t}$, along with user action $a_{u,t}$ produces robot action $a_{r,t}$. LAMS begins without task-specific demonstrations, and improves incrementally through user interaction by incorporating user-generated examples into the rule prompt $l_{rule}^{t}$. The framework consists of three main components: LLM Input Generation, LLM Output Processing, and Incremental Improvement, which are respectively detailed in Section \ref{['sec:input']}, \ref{['sec:output']} and \ref{['sec:incre']}.
  • Figure 3: Usage of the Xbox controller as the user interface in our experiments.
  • Figure 4: Average number of manual mode switches across 5 experiments from our ablation study on the water pouring task. Error bars show standard deviations.
  • Figure 5: Number of manual mode switches averaged over all participants. Error bars show standard deviations. Significance brackets indicate that there are statistical significant differences between LAMS and all other methods on trial 3 in both tasks.
  • ...and 7 more figures