Agentic Skill Discovery

Xufeng Zhao; Cornelius Weber; Stefan Wermter

Agentic Skill Discovery

Xufeng Zhao, Cornelius Weber, Stefan Wermter

TL;DR

Agentic Skill Discovery (ASD) presents a framework where an LLM autonomously proposes novel robotic tasks, learns corresponding language-conditioned policies via reinforcement learning guided by LLM-generated reward and success signals, and uses a vision-language evaluator to validate outcomes. By coordinating fast and slow success determinations and employing Retrieval Augmented Generation (RAG), ASD achieves reliable skill discovery from an initial zero-skill state and expands the robot's capabilities through on-demand task decomposition and stacking. The approach mitigates non-meaningful RL behaviors and reduces human supervision, demonstrating scalable, autonomous development of meaningful robotic skills in a table manipulation scenario. The work suggests a path toward agentic AI systems that can iteratively acquire, validate, and compose new skills with minimal human input, while outlining challenges for real-world deployment and safety.

Abstract

Language-conditioned robotic skills make it possible to apply the high-level reasoning of Large Language Models (LLMs) to low-level robotic control. A remaining challenge is to acquire a diverse set of fundamental skills. Existing approaches either manually decompose a complex task into atomic robotic actions in a top-down fashion, or bootstrap as many combinations as possible in a bottom-up fashion to cover a wider range of task possibilities. These decompositions or combinations, however, require an initial skill library. For example, a ``grasping'' capability can never emerge from a skill library containing only diverse ``pushing'' skills. Existing skill discovery techniques with reinforcement learning acquire skills by an exhaustive exploration but often yield non-meaningful behaviors. In this study, we introduce a novel framework for skill discovery that is entirely driven by LLMs. The framework begins with an LLM generating task proposals based on the provided scene description and the robot's configurations, aiming to incrementally acquire new skills upon task completion. For each proposed task, a series of reinforcement learning processes are initiated, utilizing reward and success determination functions sampled by the LLM to develop the corresponding policy. The reliability and trustworthiness of learned behaviors are further ensured by an independent vision-language model. We show that starting with zero skill, the skill library emerges and expands to more and more meaningful and reliable skills, enabling the robot to efficiently further propose and complete advanced tasks. Project page: \url{https://agentic-skill-discovery.github.io}.

Agentic Skill Discovery

TL;DR

Abstract

Paper Structure (22 sections, 17 figures, 2 tables)

This paper contains 22 sections, 17 figures, 2 tables.

Introduction
Related Work
Agentic Skill Discovery
Iterative Task Proposal and Skill Collection
Evolutionary Skill Learning with Fast and Slow Success Determination
On-demand Skill Learning with Top-Down Quest Decomposition
Experiment
Q1. What kind of tasks can be proposed by LLMs?
Q2. Can ASD acquire reliable skills automatically?
Q3. How RL and RAG influence skill learning?
Q4. Can challenging tasks be completed by stacking skills?
Conclusion
Limitations and Future Work
Generated Functions Probe
Success Functions
...and 7 more sections

Figures (17)

Figure 1: Agentic Skill Discovery gradually acquires contextual skills for table manipulation.
Figure 2: Contextual skill acquisition loop of ASD. Given the environment setup and the robot's current abilities, an LLM continually proposes tasks for the robot to complete, and the successful completion will be collected as acquired skills, each with several neural network variants (options).
Figure 3: Evolutionary search of reward functions for defined tasks with deterministic success functions such that the success rate can be reliably computed and used as a measure of fitness.
Figure 4: The evolutionary skill learning procedure of ASD involves an LLM composing a bunch of both reward and success functions (left, both are also conditioned on the environmental and robotic information as during task proposal, omitted here for simplicity), corresponding reinforcement learning to train policies (middle), and evolutionary search with both learning statistics, e.g., success rate, and GPT-4V assessment (right).
Figure 5: Top: By chaining together learned skills, ASD can further learn a new skill $\pi^i$ on demand. Bottom: ASD solves quests, namely challenging tasks, with top-down decomposition and skill learning, where the skill library expands for each subtask's completion.
...and 12 more figures

Agentic Skill Discovery

TL;DR

Abstract

Agentic Skill Discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (17)