Table of Contents
Fetching ...

DrEureka: Language Model Guided Sim-To-Real Transfer

Yecheng Jason Ma, William Liang, Hung-Ju Wang, Sam Wang, Yuke Zhu, Linxi Fan, Osbert Bastani, Dinesh Jayaraman

TL;DR

This work presents DrEureka, an LLM-guided pipeline that automates reward design and domain randomization for sim-to-real transfer in robotics. By decomposing the problem into reward synthesis, a reward-aware physics prior, and LLM-driven DR generation, DrEureka achieves competitive real-world transfer on quadruped locomotion and dexterous manipulation without manual tuning. It demonstrates robustness through a novel task—walking a quadruped on a yoga ball—and shows superiority over human-designed configurations and prior DR baselines. The results suggest that coupling foundation models with physics simulators can substantially accelerate real-world robot learning with reduced human labor.

Abstract

Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.

DrEureka: Language Model Guided Sim-To-Real Transfer

TL;DR

This work presents DrEureka, an LLM-guided pipeline that automates reward design and domain randomization for sim-to-real transfer in robotics. By decomposing the problem into reward synthesis, a reward-aware physics prior, and LLM-driven DR generation, DrEureka achieves competitive real-world transfer on quadruped locomotion and dexterous manipulation without manual tuning. It demonstrates robustness through a novel task—walking a quadruped on a yoga ball—and shows superiority over human-designed configurations and prior DR baselines. The results suggest that coupling foundation models with physics simulators can substantially accelerate real-world robot learning with reduced human labor.

Abstract

Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.
Paper Structure (15 sections, 7 equations, 13 figures, 16 tables, 2 algorithms)

This paper contains 15 sections, 7 equations, 13 figures, 16 tables, 2 algorithms.

Figures (13)

  • Figure 1: DrEureka takes the task and safety instruction, along with environment source code, and runs Eureka to generate a regularized reward function and policy. Then, it tests the policy under different simulation conditions to build a reward-aware physics prior, which is provided to the LLM to generate a set of domain randomization (DR) parameters. Finally, using the synthesized reward and DR parameters, it trains policies for real-world deployment.
  • Figure 2: Our quadruped locomotion, dexterous cube rotation, and walking globe tasks. Walking globe is a novel task to show DrEureka's capability for guiding the sim-to-real transfer of a challenging new task without pre-existing sim-to-real configurations.
  • Figure 3: DrEureka prompt for generating domain randomization parameters. The blue paragraph describes the instruction, and the green paragraph is the reward aware parameter prior computed in Algorithm \ref{['algo:physics-prior']}.
  • Figure 4: The default real-world environment as well as additional environments to test DrEureka's robustness for quadrupedal locomotion.
  • Figure 5: Real-world robustness evaluation.DrEureka performs consistently across different terrains and maintains advantages over Human-Designed.
  • ...and 8 more figures