AURA: Autonomous Upskilling with Retrieval-Augmented Agents
Alvin Zhu, Yusuke Tanaka, Andrew Goldberg, Dennis Hong
TL;DR
AURA tackles the brittleness and manual burden of curriculum reinforcement learning for agile robots by introducing a schema-validated YAML framework that translates natural language prompts into executable multi-stage curricula. It augments this with retrieval-augmented planning over a vector database of past curricula and a team of specialized LLM agents to produce reusable, executable workflows, all validated before GPU time is spent. Empirical results show AURA outperforms LLM-guided baselines in generation success and in locomotion/manipulation tasks, and demonstrate robust zero-shot deployment on a custom humanoid in outdoor settings. The work highlights the practical impact of combining schema validation, RAG, and modular LLM collaboration to automate curriculum design and scalable policy learning for real-world robotics.
Abstract
Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum reinforcement learning (RL) framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated before any GPU time is used, ensuring efficient and reliable execution. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine curriculum stages based on prior training results stored in a vector database, enabling continual improvement over time. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines in generation success rate, humanoid locomotion, and manipulation tasks. Ablation studies highlight the importance of schema validation and retrieval for curriculum quality. AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot in multiple environments - capabilities that did not exist previously with manually designed controllers. By abstracting the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be complex to construct by hand. Project page: https://aura-research.org/
