Table of Contents
Fetching ...

Curiosity-Driven Imagination: Discovering Plan Operators and Learning Associated Policies for Open-World Adaptation

Pierrick Lorang, Hong Lu, Matthias Scheutz

TL;DR

Open-world robotics require fast adaptation to unforeseen dynamics; this work proposes a bi-level neuro-symbolic framework that merges symbolic planning with a curiosity-driven neural model. Lifted symbolic operators learned from interactions are used to build LTL based reward machines, while the imaginary planning domain guides exploration and planning. An adapting executor and a mechanism to refine or create new operators enable rapid accommodation of novelties. In RoboSuite pick and place with sequential novelties, Bi-Model delivers faster convergence and higher asymptotic success than state-of-the-art hybrids, demonstrating improved sample efficiency and robustness.

Abstract

Adapting quickly to dynamic, uncertain environments-often called "open worlds"-remains a major challenge in robotics. Traditional Task and Motion Planning (TAMP) approaches struggle to cope with unforeseen changes, are data-inefficient when adapting, and do not leverage world models during learning. We address this issue with a hybrid planning and learning system that integrates two models: a low level neural network based model that learns stochastic transitions and drives exploration via an Intrinsic Curiosity Module (ICM), and a high level symbolic planning model that captures abstract transitions using operators, enabling the agent to plan in an "imaginary" space and generate reward machines. Our evaluation in a robotic manipulation domain with sequential novelty injections demonstrates that our approach converges faster and outperforms state-of-the-art hybrid methods.

Curiosity-Driven Imagination: Discovering Plan Operators and Learning Associated Policies for Open-World Adaptation

TL;DR

Open-world robotics require fast adaptation to unforeseen dynamics; this work proposes a bi-level neuro-symbolic framework that merges symbolic planning with a curiosity-driven neural model. Lifted symbolic operators learned from interactions are used to build LTL based reward machines, while the imaginary planning domain guides exploration and planning. An adapting executor and a mechanism to refine or create new operators enable rapid accommodation of novelties. In RoboSuite pick and place with sequential novelties, Bi-Model delivers faster convergence and higher asymptotic success than state-of-the-art hybrids, demonstrating improved sample efficiency and robustness.

Abstract

Adapting quickly to dynamic, uncertain environments-often called "open worlds"-remains a major challenge in robotics. Traditional Task and Motion Planning (TAMP) approaches struggle to cope with unforeseen changes, are data-inefficient when adapting, and do not leverage world models during learning. We address this issue with a hybrid planning and learning system that integrates two models: a low level neural network based model that learns stochastic transitions and drives exploration via an Intrinsic Curiosity Module (ICM), and a high level symbolic planning model that captures abstract transitions using operators, enabling the agent to plan in an "imaginary" space and generate reward machines. Our evaluation in a robotic manipulation domain with sequential novelty injections demonstrates that our approach converges faster and outperforms state-of-the-art hybrid methods.

Paper Structure

This paper contains 8 sections, 6 equations, 4 figures, 2 algorithms.

Figures (4)

  • Figure 1: Curiosity-Driven Imagination: The agent learns a bi-level model of the environment—continuous (neural network) and symbolic (planning domain). The continuous component drives intrinsic curiosity, guiding the agent to unfamiliar states for symbolic abstraction, while the symbolic component constructs a reward machine based on hypothetical plans.
  • Figure 2: Operator Discovery: The agent identifies hypothetical lifted operators from symbolic transitions, which are employed for planning and generating an LTL formula to enhance the reward signal. The reward feedback comprises two components: a reward machine that activates when state transitions meet the LTL formula, and an intrinsic curiosity reward that encourages the agent to explore and learn new transitions.
  • Figure 3: Left: The original Pick&Place task, where the agent must place a can in the drop-off bin. Right: The Locked Door novelty, where a door blocks access to the drop-off bin. The agent must first unlock the door via a proximity sensor (blue ball) before pushing it open. The red ball marks the light-switch location.
  • Figure 4: Experimental Results. The upward arrow next to SR indicates that higher is better whereas the downward arrow next to $T_{adapt}$ indicates that lower is better. Left: Main results compared to the baselines. Right: Ablation studies results with/without ICM/PRM. See text for more details.