Table of Contents
Fetching ...

SkillWrapper: Generative Predicate Invention for Skill Abstraction

Ziyi Yang, Benned Hedegaard, Ahmed Jaafar, Yichen Wei, Skye Thompson, Shreyas S. Raman, Haotian Fu, Stefanie Tellex, George Konidaris, David Paulius, Naman Shah

TL;DR

This work tackles long-horizon robotic planning by learning symbolic, object-centric predicates directly from RGB observations through generative predicate invention. It introduces SkillWrapper, a three-stage framework that actively collects data, invents predicates with a formal theory ensuring soundness and completeness, and learns grounded operators usable by classical planners. The approach is validated across simulation and two real-robot platforms, showing improved generalization to unseen tasks and robust long-horizon performance with minimal hand-engineering. The results demonstrate a principled path to scalable, interpretable skill reasoning in embodied agents using foundation-model guided data collection and predicate evaluation.

Abstract

Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.

SkillWrapper: Generative Predicate Invention for Skill Abstraction

TL;DR

This work tackles long-horizon robotic planning by learning symbolic, object-centric predicates directly from RGB observations through generative predicate invention. It introduces SkillWrapper, a three-stage framework that actively collects data, invents predicates with a formal theory ensuring soundness and completeness, and learns grounded operators usable by classical planners. The approach is validated across simulation and two real-robot platforms, showing improved generalization to unseen tasks and robust long-horizon performance with minimal hand-engineering. The results demonstrate a principled path to scalable, interpretable skill reasoning in embodied agents using foundation-model guided data collection and predicate evaluation.

Abstract

Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.

Paper Structure

This paper contains 82 sections, 6 theorems, 32 equations, 20 figures, 16 tables, 7 algorithms.

Key Result

Theorem 1

Every operator $a \in {\mathcal{A}}_n$ in the model ${\mathcal{M}}_n$ learned by SkillWrapper is supported by at least one observed transition $\langle s,\omega,s'\rangle \in B_n$. That is, $\underline{s} \models \textsc{Pre}_a$ and $\underline{s}' \models \textsc{Eff}_a$.

Figures (20)

  • Figure 1: Overview of SkillWrapper. For an agent equipped with black-box skills, SkillWrapper learns skill representations that are compatible with off-the-shelf planners. These representations are comprised of predicates invented by the foundation model. Given a novel planning problem described using the initial state and goal state as RGB images, a foundation model produces the corresponding abstract states by applying the invented predicates to the low-level states. SkillWrapper is agnostic to the agent, and we illustrate both real-world (robots) and simulated agents in this figure.
  • Figure 2: Example of Predicate Invention. The initial states of two transitions are both said to satisfy the preconditions of certain operators learned from the same skill, while transition 1 is successful, but transition 2 is not. In this case, the first condition (precondition) is triggered, and the foundation model is prompted with both transitions to invent a new predicate.
  • Figure 3: Robotouille environment.
  • Figure 4: Initial and Goal States for Real Robot Experiments.
  • Figure 5: Sequence of Bimanual Robot Skill Execution with Predicate Value Changes
  • ...and 15 more figures

Theorems & Definitions (17)

  • Theorem 1: Soundness of SkillWrapper
  • Lemma 1
  • Theorem 2: Probabilistic-completeness of SkillWrapper
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4: Exact Model
  • Definition 5: Soundness
  • Definition 6: Completeness
  • Definition 7: Suitability
  • ...and 7 more