Table of Contents
Fetching ...

Don't Yell at Your Robot: Physical Correction as the Collaborative Interface for Language Model Powered Robots

Chuye Zhang, Yifei Simon Shao, Harshil Parekh, Junyao Shi, Pratik Chaudhari, Vijay Kumar, Nadia Figueroa

TL;DR

This work addresses the challenge of reliable long-horizon tasking for LLM-powered robots by replacing exclusive reliance on natural language with physical corrections as a real-time interface. It introduces a DS-based action framework in which semantic actions issued by an LLM are mapped to 6-DoF dynamical-system commands, while a particle-filter maintains a belief over DS parameters that can be updated through human touch and reflected back into LLM prompts. The system combines confidence-based variable impedance control, a particle-filter estimator, and an interface manager to translate between semantic and DS actions, enabling real-time corrections and learning from corrections. In hybrid real+virtual experiments, the approach demonstrates that physical corrections align robot behavior with human intent, enable memory of corrections in the LLM, and support smoother multi-step task execution, highlighting a practical pathway toward proactive, physically guided human–robot collaboration.

Abstract

We present a novel approach for enhancing human-robot collaboration using physical interactions for real-time error correction of large language model (LLM) powered robots. Unlike other methods that rely on verbal or text commands, the robot leverages an LLM to proactively executes 6 DoF linear Dynamical System (DS) commands using a description of the scene in natural language. During motion, a human can provide physical corrections, used to re-estimate the desired intention, also parameterized by linear DS. This corrected DS can be converted to natural language and used as part of the prompt to improve future LLM interactions. We provide proof-of-concept result in a hybrid real+sim experiment, showcasing physical interaction as a new possibility for LLM powered human-robot interface.

Don't Yell at Your Robot: Physical Correction as the Collaborative Interface for Language Model Powered Robots

TL;DR

This work addresses the challenge of reliable long-horizon tasking for LLM-powered robots by replacing exclusive reliance on natural language with physical corrections as a real-time interface. It introduces a DS-based action framework in which semantic actions issued by an LLM are mapped to 6-DoF dynamical-system commands, while a particle-filter maintains a belief over DS parameters that can be updated through human touch and reflected back into LLM prompts. The system combines confidence-based variable impedance control, a particle-filter estimator, and an interface manager to translate between semantic and DS actions, enabling real-time corrections and learning from corrections. In hybrid real+virtual experiments, the approach demonstrates that physical corrections align robot behavior with human intent, enable memory of corrections in the LLM, and support smoother multi-step task execution, highlighting a practical pathway toward proactive, physically guided human–robot collaboration.

Abstract

We present a novel approach for enhancing human-robot collaboration using physical interactions for real-time error correction of large language model (LLM) powered robots. Unlike other methods that rely on verbal or text commands, the robot leverages an LLM to proactively executes 6 DoF linear Dynamical System (DS) commands using a description of the scene in natural language. During motion, a human can provide physical corrections, used to re-estimate the desired intention, also parameterized by linear DS. This corrected DS can be converted to natural language and used as part of the prompt to improve future LLM interactions. We provide proof-of-concept result in a hybrid real+sim experiment, showcasing physical interaction as a new possibility for LLM powered human-robot interface.

Paper Structure

This paper contains 14 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The robot tries to proactively help by co-carrying a pot with the human. However, based on the current observation of the environment, the LLM outputs the incorrect action to move the pot towards the cutting board. With physical human correction, the robot adjusts the Dynamical System (DS) action parameters and aids human in carrying the pot towards the stove. A feedback of this correction is sent to the LLM for learning.
  • Figure 2: System overview of our pipeline. The LLM is provided with current semantic scene description from perception module and the previous interaction history. It outputs semantic action that is converted to DS action by the interface manager. This DS action then drives the manipulator by updating the particles. If the human physically corrects the robot, the DS actions are re-estimated based on uniform DS action priors and converted to semantic corrections for the LLM to improve subsequent interactions.
  • Figure 3: The multi-step task of cooking beans is executed by the LLM-powered robot with physical human correction. Plots (a),(b),(c), and (d) illustrate the order of task execution with the LLM response, robot action and human activity for key steps. Plots (e),(f), and (g) show the evolution of DS action parameter belief when physically corrected by the human: The human exert force on the robot, as the tracking error increases, the confidence starts to drop, the robot reduces gains and the particle filter increases resampling rate $r$ by placing particles on perceived objects. Once the human stops interacting with the robot, the tracking error reduces, the confidence and control gains increase. Since most particles are near "the stove", the LLM is notified of the correction.