DrEureka: Language Model Guided Sim-To-Real Transfer

Yecheng Jason Ma; William Liang; Hung-Ju Wang; Sam Wang; Yuke Zhu; Linxi Fan; Osbert Bastani; Dinesh Jayaraman

DrEureka: Language Model Guided Sim-To-Real Transfer

Yecheng Jason Ma, William Liang, Hung-Ju Wang, Sam Wang, Yuke Zhu, Linxi Fan, Osbert Bastani, Dinesh Jayaraman

TL;DR

This work presents DrEureka, an LLM-guided pipeline that automates reward design and domain randomization for sim-to-real transfer in robotics. By decomposing the problem into reward synthesis, a reward-aware physics prior, and LLM-driven DR generation, DrEureka achieves competitive real-world transfer on quadruped locomotion and dexterous manipulation without manual tuning. It demonstrates robustness through a novel task—walking a quadruped on a yoga ball—and shows superiority over human-designed configurations and prior DR baselines. The results suggest that coupling foundation models with physics simulators can substantially accelerate real-world robot learning with reduced human labor.

Abstract

Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.

DrEureka: Language Model Guided Sim-To-Real Transfer

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 13 figures, 16 tables, 2 algorithms)

This paper contains 15 sections, 7 equations, 13 figures, 16 tables, 2 algorithms.

Introduction
Related Work
Problem Setting
Method
Background: Eureka Reward Design
Safety Instruction
Reward-Aware Physics Prior
LLM for Domain Randomization
Experimental Setup
Results and Analysis
Comparison to Pre-Existing Sim-to-Real Configurations
Does DrEureka generate effective DR configurations?
The Walking Globe Trick
Conclusion
Limitations

Figures (13)

Figure 1: DrEureka takes the task and safety instruction, along with environment source code, and runs Eureka to generate a regularized reward function and policy. Then, it tests the policy under different simulation conditions to build a reward-aware physics prior, which is provided to the LLM to generate a set of domain randomization (DR) parameters. Finally, using the synthesized reward and DR parameters, it trains policies for real-world deployment.
Figure 2: Our quadruped locomotion, dexterous cube rotation, and walking globe tasks. Walking globe is a novel task to show DrEureka's capability for guiding the sim-to-real transfer of a challenging new task without pre-existing sim-to-real configurations.
Figure 3: DrEureka prompt for generating domain randomization parameters. The blue paragraph describes the instruction, and the green paragraph is the reward aware parameter prior computed in Algorithm \ref{['algo:physics-prior']}.
Figure 4: The default real-world environment as well as additional environments to test DrEureka's robustness for quadrupedal locomotion.
Figure 5: Real-world robustness evaluation.DrEureka performs consistently across different terrains and maintains advantages over Human-Designed.
...and 8 more figures

DrEureka: Language Model Guided Sim-To-Real Transfer

TL;DR

Abstract

DrEureka: Language Model Guided Sim-To-Real Transfer

Authors

TL;DR

Abstract

Table of Contents

Figures (13)