Table of Contents
Fetching ...

AURA: Autonomous Upskilling with Retrieval-Augmented Agents

Alvin Zhu, Yusuke Tanaka, Andrew Goldberg, Dennis Hong

TL;DR

AURA tackles the brittleness and manual burden of curriculum reinforcement learning for agile robots by introducing a schema-validated YAML framework that translates natural language prompts into executable multi-stage curricula. It augments this with retrieval-augmented planning over a vector database of past curricula and a team of specialized LLM agents to produce reusable, executable workflows, all validated before GPU time is spent. Empirical results show AURA outperforms LLM-guided baselines in generation success and in locomotion/manipulation tasks, and demonstrate robust zero-shot deployment on a custom humanoid in outdoor settings. The work highlights the practical impact of combining schema validation, RAG, and modular LLM collaboration to automate curriculum design and scalable policy learning for real-world robotics.

Abstract

Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum reinforcement learning (RL) framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated before any GPU time is used, ensuring efficient and reliable execution. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine curriculum stages based on prior training results stored in a vector database, enabling continual improvement over time. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines in generation success rate, humanoid locomotion, and manipulation tasks. Ablation studies highlight the importance of schema validation and retrieval for curriculum quality. AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot in multiple environments - capabilities that did not exist previously with manually designed controllers. By abstracting the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be complex to construct by hand. Project page: https://aura-research.org/

AURA: Autonomous Upskilling with Retrieval-Augmented Agents

TL;DR

AURA tackles the brittleness and manual burden of curriculum reinforcement learning for agile robots by introducing a schema-validated YAML framework that translates natural language prompts into executable multi-stage curricula. It augments this with retrieval-augmented planning over a vector database of past curricula and a team of specialized LLM agents to produce reusable, executable workflows, all validated before GPU time is spent. Empirical results show AURA outperforms LLM-guided baselines in generation success and in locomotion/manipulation tasks, and demonstrate robust zero-shot deployment on a custom humanoid in outdoor settings. The work highlights the practical impact of combining schema validation, RAG, and modular LLM collaboration to automate curriculum design and scalable policy learning for real-world robotics.

Abstract

Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum reinforcement learning (RL) framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated before any GPU time is used, ensuring efficient and reliable execution. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine curriculum stages based on prior training results stored in a vector database, enabling continual improvement over time. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines in generation success rate, humanoid locomotion, and manipulation tasks. Ablation studies highlight the importance of schema validation and retrieval for curriculum quality. AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot in multiple environments - capabilities that did not exist previously with manually designed controllers. By abstracting the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be complex to construct by hand. Project page: https://aura-research.org/

Paper Structure

This paper contains 56 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: AURA-trained policies deployed successfully on custom humanoid hardware and in simulation for locomotion and manipulation tasks.
  • Figure 2: An overview of the AURA curriculum generation and policy training framework.
  • Figure 3: Survival and linear velocity tracking scores across iterations to evaluate locomotion policy quality on a custom humanoid. The plots show the policy quality improvements of AURA over five iterations compared to MuJoCo Playground's expert designed Berkeley Humanoid reward and CuricuLLM's reported results in Isaac Lab. AURA Blind generates rewards from scratch (VDB is initialized as empty) and AURA Tune modifies and improves an existing reward designed for another embodiment (VDB is initialized with MuJoCo Playground's Berkeley Humanoid expert human rewards, domain randomizations, and training configuration).