Table of Contents
Fetching ...

iPlanner: Imperative Path Planning

Fan Yang, Chen Wang, Cesar Cadena, Marco Hutter

TL;DR

The paper tackles latency and error propagation in modular path-planning systems and the generalization gaps of end-to-end approaches. It introduces Imperative Learning (IL) with a differentiable ESDF-based cost map and a Bi-Level Optimization (BLO) training scheme to learn a perception–planning policy from a single depth frame without demonstrations. The method couples a perception/planning network to a trajectory optimizer via a differentiable cost, enabling end-to-end gradient-based updates and unsupervised supervision through task-level loss, including a fear loss component. Empirical results show around 4x faster planning than a classic non-learning pipeline, robust performance under localization noise, and substantial generalization gains (26–87% SPL improvements) in unseen environments, with successful real-world deployment on a legged robot achieving low-latency planning."

Abstract

The problem of path planning has been studied for years. Classic planning pipelines, including perception, mapping, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows task-level costs of predicted trajectories to be backpropagated through all components to update the network through direct gradient descent. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in SPL performance compared to baseline learning methods.

iPlanner: Imperative Path Planning

TL;DR

The paper tackles latency and error propagation in modular path-planning systems and the generalization gaps of end-to-end approaches. It introduces Imperative Learning (IL) with a differentiable ESDF-based cost map and a Bi-Level Optimization (BLO) training scheme to learn a perception–planning policy from a single depth frame without demonstrations. The method couples a perception/planning network to a trajectory optimizer via a differentiable cost, enabling end-to-end gradient-based updates and unsupervised supervision through task-level loss, including a fear loss component. Empirical results show around 4x faster planning than a classic non-learning pipeline, robust performance under localization noise, and substantial generalization gains (26–87% SPL improvements) in unseen environments, with successful real-world deployment on a legged robot achieving low-latency planning."

Abstract

The problem of path planning has been studied for years. Classic planning pipelines, including perception, mapping, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows task-level costs of predicted trajectories to be backpropagated through all components to update the network through direct gradient descent. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in SPL performance compared to baseline learning methods.
Paper Structure (15 sections, 11 equations, 11 figures, 2 tables)

This paper contains 15 sections, 11 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Experiment of planning through static and dynamic obstacles with a legged robot, with A, B, and C representing three planning events. The goal is set outside the door from the start. The green curve shows the robot's trajectory as it (A) avoids a static obstacle, (B) avoids a moving human, and (C) climbs stairs to reach the goal. The blue curve represents human movement. The bottom images illustrate the depth measurements and the predicted trajectories from our method for these three events.
  • Figure 2: An overview of training the planning policy using IL. The pipeline consists of two parts, forming a BLO process with upper-level network update and lower-level trajectory optimization. During inference, the perception and planning network first encodes the depth measurement and goal position to predict a key-point path toward the goal with an associated collision probability. During training, the trajectory cost and task-level "fear" loss are propagated back to provide direct gradients for updating both the perception and planning networks simultaneously.
  • Figure 3: An illustration of a training environment and its ESDF cost map generated from the Matterport3D Matterport3D dataset. (a) depicts the point cloud reconstructed from collected depth images within a Matterport3D room. (b) shows the smoothed ESDF cost map produced from the point cloud with Gaussian filtering.
  • Figure 4: The mathematically BLO pipeline of the imperative training for the planning policy. The network function $f_\theta$ predicts a path as the input for the TO process. The TO minimizes the cost $\mathcal{C}$ with the optimized trajectory $\boldsymbol{\tau^*}$. The trajectory cost $\mathcal{C}$, combined with additional loss terms, forms the total training loss $\mathcal{F}$, and $\mathcal{F}$ is then backpropagated to update the network parameter $\theta$.
  • Figure 5: Illustration of depth observation from simulation and the real world. (a) A depth image generated from the Gazebo simulation in the Matterport3D environment. (b) A depth observation obtained from Intel RealSense D435 during real-world experiments.
  • ...and 6 more figures