Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

Brian Ichter; Pierre Sermanet; Corey Lynch

Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

Brian Ichter, Pierre Sermanet, Corey Lynch

TL;DR

BELT presents a unified approach to long-horizon planning by marrying an RRT-inspired global search with a local, task-conditioned policy and a temporally extended task-conditioned model. It learns a latent task space from play data (Play-LMP) and uses a temporal distance classifier to bias expansions, enabling efficient exploration of sequential subtasks. Experimental results in a realistic Mujoco playground show BELT achieving robust long-horizon planning, outperforming baselines like CEM and single-goal policies, with higher success and feasibility when using a task-conditioned model. The work demonstrates the potential for scalable, real-world long-horizon manipulation and outlines avenues for replanning and dynamic environments.

Abstract

Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of efficiently exploring large state spaces and computing long-horizon, sequential plans. However, these algorithms are generally challenged with complex, stochastic, and high-dimensional state spaces as well as in the presence of narrow passages, which naturally emerge in tasks that interact with the environment. Machine learning offers a promising solution for its ability to learn general policies that can handle complex interactions and high-dimensional observations. However, these policies are generally limited in horizon length. Our approach, Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search. BELT uses an RRT-inspired tree search to efficiently explore the state space. Locally, the exploration is guided by a task-conditioned, learned policy capable of performing general short-horizon tasks. This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks. This search is aided by a task-conditioned model that temporally extends dynamics propagation to allow long-horizon search and sequential reasoning over tasks. BELT is demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.

Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

TL;DR

Abstract

Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)