Table of Contents
Fetching ...

Eurekaverse: Environment Curriculum Generation via Large Language Models

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma

TL;DR

This paper introduces Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training and validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning.

Abstract

Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.

Eurekaverse: Environment Curriculum Generation via Large Language Models

TL;DR

This paper introduces Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training and validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning.

Abstract

Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.

Paper Structure

This paper contains 22 sections, 2 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Eurekaverse teaches a quadrupedal robot to navigate diverse obstacle courses, including jumps, climbs, and ramps.
  • Figure 2: Eurekaverse automatically learns complex skills by performing agent-environment co-evolution, which iterates between evolutionary environment generation and population-based policy training and evaluation.
  • Figure 3: Our prompt and in-context example (blue), an example LLM response (purple), and its visualization. In the rendering, large red dots indicate goals, and the blue dot is the current goal; small dots indicate heading command (direction to the goal).
  • Figure 4: Comparing sim benchmark performance across training steps for Eurekaverse and baselines (left), and final benchmark performance for Eurekaverse and ablations (right). The training curve for ablations is in Appendix. Experiments are run over 3 seeds.
  • Figure 5: Comparison of Eurekaverse's iterations against Human-Designed, visualized per obstacle type and over each difficulty (easiest to hardest) on sim benchmark. Higher area under the curve is better.
  • ...and 5 more figures