Table of Contents
Fetching ...

SLAP: Shortcut Learning for Abstract Planning

Y. Isabel Liu, Bowen Li, Benjamin Eysenbach, Tom Silver

TL;DR

SLAP introduces Shortcut Learning for Abstract Planning to augment a given Task and Motion Planning (TAMP) abstraction with RL-learned shortcuts. By constructing an abstract planning graph from predefined options and training low-level policies to connect abstract states, SLAP achieves substantially shorter plans and higher success rates than pure planning or pure RL across four robotic domains, while generalizing to new objects and goals. The approach leverages offline data, a two-level planning graph, and object-centric representations to blend planning efficiency with learned improvisations like 'slap' and 'wiggle', providing a practical plug-and-play enhancement for long-horizon manipulation. The work demonstrates a spectrum between planning and learning, and points to future integrations with richer abstractions and more capable planners to further improve scalability and robustness in real-world robotics.

Abstract

Long-horizon decision-making with sparse rewards and continuous states and actions remains a fundamental challenge in AI and robotics. Task and motion planning (TAMP) is a model-based framework that addresses this challenge by planning hierarchically with abstract actions (options). These options are manually defined, limiting the agent to behaviors that we as human engineers know how to program (pick, place, move). In this work, we propose Shortcut Learning for Abstract Planning (SLAP), a method that leverages existing TAMP options to automatically discover new ones. Our key idea is to use model-free reinforcement learning (RL) to learn shortcuts in the abstract planning graph induced by the existing options in TAMP. Without any additional assumptions or inputs, shortcut learning leads to shorter solutions than pure planning, and higher task success rates than flat and hierarchical RL. Qualitatively, SLAP discovers dynamic physical improvisations (e.g., slap, wiggle, wipe) that differ significantly from the manually-defined ones. In experiments in four simulated robotic environments, we show that SLAP solves and generalizes to a wide range of tasks, reducing overall plan lengths by over 50% and consistently outperforming planning and RL baselines.

SLAP: Shortcut Learning for Abstract Planning

TL;DR

SLAP introduces Shortcut Learning for Abstract Planning to augment a given Task and Motion Planning (TAMP) abstraction with RL-learned shortcuts. By constructing an abstract planning graph from predefined options and training low-level policies to connect abstract states, SLAP achieves substantially shorter plans and higher success rates than pure planning or pure RL across four robotic domains, while generalizing to new objects and goals. The approach leverages offline data, a two-level planning graph, and object-centric representations to blend planning efficiency with learned improvisations like 'slap' and 'wiggle', providing a practical plug-and-play enhancement for long-horizon manipulation. The work demonstrates a spectrum between planning and learning, and points to future integrations with richer abstractions and more capable planners to further improve scalability and robustness in real-world robotics.

Abstract

Long-horizon decision-making with sparse rewards and continuous states and actions remains a fundamental challenge in AI and robotics. Task and motion planning (TAMP) is a model-based framework that addresses this challenge by planning hierarchically with abstract actions (options). These options are manually defined, limiting the agent to behaviors that we as human engineers know how to program (pick, place, move). In this work, we propose Shortcut Learning for Abstract Planning (SLAP), a method that leverages existing TAMP options to automatically discover new ones. Our key idea is to use model-free reinforcement learning (RL) to learn shortcuts in the abstract planning graph induced by the existing options in TAMP. Without any additional assumptions or inputs, shortcut learning leads to shorter solutions than pure planning, and higher task success rates than flat and hierarchical RL. Qualitatively, SLAP discovers dynamic physical improvisations (e.g., slap, wiggle, wipe) that differ significantly from the manually-defined ones. In experiments in four simulated robotic environments, we show that SLAP solves and generalizes to a wide range of tasks, reducing overall plan lengths by over 50% and consistently outperforming planning and RL baselines.

Paper Structure

This paper contains 41 sections, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Shortcut Learning for Abstract Planning (SLAP) uses reinforcement learning (RL) to find low-level shortcuts in abstract plans. SLAP finds shorter trajectories than pure planning and achieves higher success rates than pure RL.
  • Figure 2: Abstract planning graph.
  • Figure 3: SLAP Pipeline.(a) We build abstract planning graphs on training tasks and generate possible shortcuts. (b) Each shortcut induces an MDP. (c) We run RL in parallel shortcut MDPs to create shortcut policies. (d) The learned policies are used to find shortcuts in abstract planning graphs for new evaluation tasks. (e) SLAP generalizes over tasks (initial states and goals) and objects.
  • Figure 4: Training Dynamics. As the number of training steps increases, more shortcuts are added and the length of the output SLAP plan decreases. In Cluttered Drawer, we visualize shortcuts learned at different stages of training and show which abstract states in the graph these shortcuts connect.
  • Figure 5: Generalization Results. In the Obstacle Tower environment, SLAP is trained on tasks with a stack of 3 obstacles and no distractors. At test time, we are able to generalize to tasks with different numbers of obstacles and distractors by substituting relevant objects.
  • ...and 4 more figures