Table of Contents
Fetching ...

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

Jiayang Song, Zhehua Zhou, Jiawei Liu, Chunrong Fang, Zhan Shu, Lei Ma

TL;DR

This work tackles reward function design for deep reinforcement learning in robotics by introducing a self-refined, zero-shot Large Language Model (LLM) framework. The LLM first formulates an initial reward from natural language prompts, then undergoes an automated evaluation loop that trains a DRL agent and measures task performance, guiding subsequent self-refinements. Across nine continuous-control tasks on three robotic platforms, the self-refined rewards rival or surpass manually crafted rewards, demonstrating broad applicability and potential to reduce human design effort. The approach suggests promising future integrations with AutoRL and task-specific LLM tuning to further automate and optimize DRL-based robotic control.

Abstract

Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

TL;DR

This work tackles reward function design for deep reinforcement learning in robotics by introducing a self-refined, zero-shot Large Language Model (LLM) framework. The LLM first formulates an initial reward from natural language prompts, then undergoes an automated evaluation loop that trains a DRL agent and measures task performance, guiding subsequent self-refinements. Across nine continuous-control tasks on three robotic platforms, the self-refined rewards rival or surpass manually crafted rewards, demonstrating broad applicability and potential to reduce human design effort. The approach suggests promising future integrations with AutoRL and task-specific LLM tuning to further automate and optimize DRL-based robotic control.

Abstract

Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.
Paper Structure (38 sections, 12 equations, 4 figures, 1 table)

This paper contains 38 sections, 12 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Our proposed self-refine LLM framework for reward function design. It consists of three steps: initial design, evaluation, and self-refinement loop. A quadruped robot forward running task is used as an example here. A complete list of the prompts used in this work can be found in the appendix.
  • Figure 2: Continuous robotic control tasks with three diverse robotic systems: robotic manipulator (Franka Emika Panda franka), quadruped robot (Anymal anymal) and quadcopter (Crazyflie crazyflie). Simulations are conducted in NVIDIA Isaac Sim isaacsim.
  • Figure 3: Reward functions in different self-refinement iterations for the quadruped robot forward running task.
  • Figure 4: System behaviors corresponding to reward functions in different self-refinement iterations, as well as the manually designed reward function. The time interval between each displayed point is set to 1s.