Table of Contents
Fetching ...

Bilevel Learning for Bilevel Planning

Bowen Li, Tom Silver, Sebastian Scherer, Alexander Gray

TL;DR

The paper tackles the challenge of generalizing robot planning from demonstrations by learning high-level abstractions through bilevel planning. It introduces IVNTR, a neuro-symbolic framework that learns neural predicates optimized for planning by alternating neural classifier learning with symbolic learning of predicate effects, producing a compact dynamic predicate set $\Psi_{\mathrm{dyn}}$. A lift-and-train objective using effect vectors, ground-effect supervision, and a neural-loss-guided tree expansion enables scalable discovery of predicates without hand-crafted definitions. Predicate selection and standard operator/sampler learning then yield a full bilevel planner that generalizes to unseen objects and task horizons, demonstrated across six simulated domains and real-robot Spot tasks, with significant improvements in unseen-task success rates (average around $77\%$) and competitive real-world performance relative to oracle planners. This work showcases a scalable path toward high-level generalization by learning planning-centric abstractions from rich, high-dimensional state spaces.

Abstract

A robot that learns from demonstrations should not just imitate what it sees -- it should understand the high-level concepts that are being demonstrated and generalize them to new tasks. Bilevel planning is a hierarchical model-based approach where predicates (relational state abstractions) can be leveraged to achieve compositional generalization. However, previous bilevel planning approaches depend on predicates that are either hand-engineered or restricted to very simple forms, limiting their scalability to sophisticated, high-dimensional state spaces. To address this limitation, we present IVNTR, the first bilevel planning approach capable of learning neural predicates directly from demonstrations. Our key innovation is a neuro-symbolic bilevel learning framework that mirrors the structure of bilevel planning. In IVNTR, symbolic learning of the predicate "effects" and neural learning of the predicate "functions" alternate, with each providing guidance for the other. We evaluate IVNTR in six diverse robot planning domains, demonstrating its effectiveness in abstracting various continuous and high-dimensional states. While most existing approaches struggle to generalize (with <35% success rate), our IVNTR achieves an average of 77% success rate on unseen tasks. Additionally, we showcase IVNTR on a mobile manipulator, where it learns to perform real-world mobile manipulation tasks and generalizes to unseen test scenarios that feature new objects, new states, and longer task horizons. Our findings underscore the promise of learning and planning with abstractions as a path towards high-level generalization.

Bilevel Learning for Bilevel Planning

TL;DR

The paper tackles the challenge of generalizing robot planning from demonstrations by learning high-level abstractions through bilevel planning. It introduces IVNTR, a neuro-symbolic framework that learns neural predicates optimized for planning by alternating neural classifier learning with symbolic learning of predicate effects, producing a compact dynamic predicate set . A lift-and-train objective using effect vectors, ground-effect supervision, and a neural-loss-guided tree expansion enables scalable discovery of predicates without hand-crafted definitions. Predicate selection and standard operator/sampler learning then yield a full bilevel planner that generalizes to unseen objects and task horizons, demonstrated across six simulated domains and real-robot Spot tasks, with significant improvements in unseen-task success rates (average around ) and competitive real-world performance relative to oracle planners. This work showcases a scalable path toward high-level generalization by learning planning-centric abstractions from rich, high-dimensional state spaces.

Abstract

A robot that learns from demonstrations should not just imitate what it sees -- it should understand the high-level concepts that are being demonstrated and generalize them to new tasks. Bilevel planning is a hierarchical model-based approach where predicates (relational state abstractions) can be leveraged to achieve compositional generalization. However, previous bilevel planning approaches depend on predicates that are either hand-engineered or restricted to very simple forms, limiting their scalability to sophisticated, high-dimensional state spaces. To address this limitation, we present IVNTR, the first bilevel planning approach capable of learning neural predicates directly from demonstrations. Our key innovation is a neuro-symbolic bilevel learning framework that mirrors the structure of bilevel planning. In IVNTR, symbolic learning of the predicate "effects" and neural learning of the predicate "functions" alternate, with each providing guidance for the other. We evaluate IVNTR in six diverse robot planning domains, demonstrating its effectiveness in abstracting various continuous and high-dimensional states. While most existing approaches struggle to generalize (with <35% success rate), our IVNTR achieves an average of 77% success rate on unseen tasks. Additionally, we showcase IVNTR on a mobile manipulator, where it learns to perform real-world mobile manipulation tasks and generalizes to unseen test scenarios that feature new objects, new states, and longer task horizons. Our findings underscore the promise of learning and planning with abstractions as a path towards high-level generalization.

Paper Structure

This paper contains 30 sections, 10 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: The Climb-Transport domain is presented as a running example. We have displayed one typical training and one test task on the top. The types, actions, and provided predicates are shown at the bottom.
  • Figure 2: (a) Overview of IVNTR during training. Given transition pairs in the continuous space, IVNTR invents neural predicates with different arguments parallelly, resulting in a candidate set. A subset that minimizes the planning objective $J(\cdot)$ is selected from the candidates, which serves as the final $\Psi_{\mathrm{dyn}}$. With the complete predicate set, sampler and operator learning can be achieved. (b) Bilevel planning with operators and samplers during test. Compositional ground predicates serve as inputs to the AI planner and provide guidance to the samplers.
  • Figure 3: Detailed neural learning process for predicate $\mathtt{P2_1(?r,?t)}$. From the demonstration dataset, we display two transition pairs (one for each action, in total four states) on the left. The neural network takes object centric features as input, predicting ground predicates (in total eight values). At the bottom, we display an example lifted effect vector for action Grasp, Gaze as $\Delta^\psi_t=[+1,+1]$. With the ground effect vector, supervisions can be derived on the predicted values. Due to the unreasonable effect supervisions, the intermediate state is labeled as both True and False, resulting in high validation loss.
  • Figure 4: Detailed symbolic learning process for predicate group $\texttt{P2(?r,?t)}$. With the neural validation loss from the previous iteration, symbolic learning starts by merging the loss into the global value vector (See Eq. (7)). After the node values in the effect tree are updated, we conduct parent node selection and expansion. Among the children nodes, we evaluate the child with the current highest value, via the neural classifer training process described in Figure \ref{['fig:method_1']}.
  • Figure 5: Visualization of the five domains (excluding Climb-Transport) we have studied in this work. These domains feature various state representations (including the high-dimensional point clouds in the Engrave domain) where our IVNTR can be generally applied.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Definition 1: Operator
  • Definition 2: Sampler
  • Definition 3: Lifted Effect Vector
  • Definition 4: Ground Effect Vector
  • Definition 5: Symbolic Learning Objective