Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Kun Chu; Xufeng Zhao; Cornelius Weber; Mengdi Li; Stefan Wermter

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter

TL;DR

Reinforcement Learning in robotic manipulation often suffers from sample inefficiency and reward specification. Lafite-RL leverages Large Language Models to provide real-time evaluative feedback during RL, adding $r_{llm}$ to the environment reward as $r_t = r_{env} + r_{llm}$, using two prompts (Scene Observer and Motion Evaluator) to guide learning. On RLBench tasks with a Franka Panda, the approach yields higher success rates and faster learning than baselines, demonstrating the potential of LLM-guided, low-effort supervision for robotic manipulation. Overall, Lafite-RL shows that non-expert users can design prompts that enable LLMs to accelerate RL without direct low-level control, offering a scalable path toward interactive, data-efficient robotic learning.

Abstract

Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM.

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

TL;DR

to the environment reward as

, using two prompts (Scene Observer and Motion Evaluator) to guide learning. On RLBench tasks with a Franka Panda, the approach yields higher success rates and faster learning than baselines, demonstrating the potential of LLM-guided, low-effort supervision for robotic manipulation. Overall, Lafite-RL shows that non-expert users can design prompts that enable LLMs to accelerate RL without direct low-level control, offering a scalable path toward interactive, data-efficient robotic learning.

Abstract

Paper Structure (8 sections, 2 equations, 4 figures, 2 tables)

This paper contains 8 sections, 2 equations, 4 figures, 2 tables.

Introduction
Related Works
Reinforcement Learning from Human Guidance
Large Language Models in Robotics
Language Agent Feedback Interactive Reinforcement Learning
Experiments
Discussion
Conclusion

Figures (4)

Figure 1: Comparions of human feedback for RL and Lafite-RL. Normal human feedback for RL requires frequent human interaction during the learning process. In contrast, Lafite-RL allows a human to interact with the LLM only once, where the human prompts the LLM to observe the RL process and provide real-time feedback.
Figure 2: Depiction of proposed Lafite-RL framework. Before learning a task, a user provides designed prompts (see Table \ref{['tab:prompts']}), including descriptions of the current task background and desired robot's behaviors, and specifications for the LLM's missions with several rules respectively. Then, Lafite-RL enables an LLM to "observe" and understand the scene information which includes the robot's past action, and evaluate the action under the current task requirements. The language parser transforms the LLM response into evaluative feedback for constructing interactive rewards.
Figure 3: Illustration of two RLBench tasks for evaluation: (a) Push buttons, and (b) Take umbrella out of umbrella stand.
Figure 4: Learning curves of Lafite-RL (red curves) and the baseline without interactive feedback (gray curves) during training on two RLBench tasks.

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

TL;DR

Abstract

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)