CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

Kanghyun Ryu; Qiayuan Liao; Zhongyu Li; Payam Delgosha; Koushil Sreenath; Negar Mehr

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

Kanghyun Ryu, Qiayuan Liao, Zhongyu Li, Payam Delgosha, Koushil Sreenath, Negar Mehr

TL;DR

CurricuLLM addresses the challenge of automatic curriculum design for learning complex robotic skills by leveraging large language models to generate a sequence of subtasks described in natural language, translate them into executable task codes with reward and goal-distribution specifications, and evaluate trained policies through trajectory analysis to select the best subtask performance. The method comprises three modules—Curriculum Design, Task Code Generation, and Policy Evaluation—and is validated across manipulation, navigation, locomotion, and a high-dimensional humanoid task, including real-world hardware transfer with the Berkeley Humanoid. Key contributions include (1) introducing a task-level curriculum designer that uses LLMs for planning and coding, (2) demonstrating efficacy across diverse robotic domains, and (3) validating that policies learned via CurricuLLM can transfer to real hardware. The results show CurricuLLM providing competitive or superior performance relative to baselines such as SAC, HER, and LLM-zeroshot, with especially notable gains on complex tasks like AntMaze and successful real-world deployment, highlighting the practical impact of automated, language-guided curriculum design in robotics.

Abstract

Curriculum learning is a training mechanism in reinforcement learning (RL) that facilitates the achievement of complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs), with their extensive training on diverse language data and ability to encapsulate world knowledge, present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex target tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks that aid target task learning in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world. Project website is https://iconlab.negarmehr.com/CurricuLLM/

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

TL;DR

Abstract

Paper Structure (18 sections, 8 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 8 figures, 1 table, 1 algorithm.

INTRODUCTION
Related Works
Curriculum Learning
Large Language Model for Robotics
Problem Formulation
Method
Generating Sequence of Language Description
Task Code Generation
Large Language Model for Policy Evaluation
Experiments
Gymnasium Environments
Berkeley Humanoid
Simulation Results
Hardware Validation
CONCLUSIONS
...and 3 more sections

Figures (8)

Figure 1: CurricuLLM takes natural language description of environments, robots, and target task that we wish the robot to learn, and then generates a sequence of subtasks. In each subtasks, it samples different task codes and evaluates the resulting trained policy to find the policy which is best aligned within the current subtask. These iterations are repeated throughout the curriculum subtasks to sequentially train a policy that reaches complex target task.
Figure 2: Curriculum generation LLM receives the natural language form of a curriculum prompt as well as the environment description to generate a sequence of subtasks. Our prompt includes instruction for tje curriculum designer, rules for how to describe the subtasks, and other tips on describing the curriculum. Environment description consists of the robot and its state variable description, the target task, and the initial state description.
Figure 3: Our task code generation and evaluation framework in each subtask. Task code generation LLM takes the environment and target task description, current and past task information, and the reward function used for previous task. Then, $K$ task code candidates for current subtask is sampled and used for fine-tuning policies from previous subtask. Then, evaluation LLM receives the statistics of trajectory rollout from trained policy and find a policy that best aligns with current subtask description.
Figure 4: Snapshot of Environments: From left to right, Fetch-Slide, Fetch-Push, AntMaze-UMaze, and Berkeley Humanoid.
Figure 5: Success rate of tasks in Gynmasium-Robotics environments.
...and 3 more figures

Theorems & Definitions (1)

Definition 1: Task-level Sequence Curriculum. narvekar2020curriculum

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

TL;DR

Abstract

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (1)