Table of Contents
Fetching ...

Model Predictive Trees: Sample-Efficient Receding Horizon Planning with Reusable Tree Search

John Lathrop, Benjamin Rivi`ere, Jedidiah Alindogan, Soon-Jo Chung

TL;DR

The restrictions on tree reuse are characterized by analyzing the induced tracking error under time-varying dynamics, revealing a tradeoff between the search depth and the timescale of the changing dynamics.

Abstract

We present Model Predictive Trees (MPT), a receding horizon tree search algorithm that improves its performance by reusing information efficiently. Whereas existing solvers reuse only the highest-quality trajectory from the previous iteration as a "hotstart", our method reuses the entire optimal subtree, enabling the search to be simultaneously guided away from the low-quality areas and towards the high-quality areas. We characterize the restrictions on tree reuse by analyzing the induced tracking error under time-varying dynamics, revealing a tradeoff between the search depth and the timescale of the changing dynamics. In numerical studies, our algorithm outperforms state-of-the-art sampling-based cross-entropy methods with hotstarting. We demonstrate our planner on an autonomous vehicle testbed performing a nonprehensile manipulation task: pushing a target object through an obstacle field. Code associated with this work will be made available at https://github.com/jplathrop/mpt.

Model Predictive Trees: Sample-Efficient Receding Horizon Planning with Reusable Tree Search

TL;DR

The restrictions on tree reuse are characterized by analyzing the induced tracking error under time-varying dynamics, revealing a tradeoff between the search depth and the timescale of the changing dynamics.

Abstract

We present Model Predictive Trees (MPT), a receding horizon tree search algorithm that improves its performance by reusing information efficiently. Whereas existing solvers reuse only the highest-quality trajectory from the previous iteration as a "hotstart", our method reuses the entire optimal subtree, enabling the search to be simultaneously guided away from the low-quality areas and towards the high-quality areas. We characterize the restrictions on tree reuse by analyzing the induced tracking error under time-varying dynamics, revealing a tradeoff between the search depth and the timescale of the changing dynamics. In numerical studies, our algorithm outperforms state-of-the-art sampling-based cross-entropy methods with hotstarting. We demonstrate our planner on an autonomous vehicle testbed performing a nonprehensile manipulation task: pushing a target object through an obstacle field. Code associated with this work will be made available at https://github.com/jplathrop/mpt.

Paper Structure

This paper contains 18 sections, 5 theorems, 23 equations, 5 figures, 1 algorithm.

Key Result

Theorem 1

A necessary and sufficient condition for (eq:undisturbed_system) to be contracting tsukamoto2021contraction is the existence of a uniformly positive definite matrix $M(\mathbf{q}, k) = \Theta(\mathbf{q}, k)^\top \Theta(\mathbf{q}, k) \in \mathbb{R}^{n \times n}$, called a contraction metric, where $ for constant $0 \leq \alpha < 1$, called the contraction rate.

Figures (5)

  • Figure 2: Top: Five seconds of real-time generated trees in our hardware experiment, in which the autonomous vehicle testbed pushes a target to a goal region behind an obstacle. On average, 2100 simulated trajectories are grown every 0.2 s. The trees are colored by the time they were grown. Bottom: The proposed tree growth algorithm visualized over four time steps. At each iteration (I - IV), new nodes and branches are added to the tree, and the best first-level child is selected as the next root. In subsequent iterations, older parts of the tree are discarded and new nodes are added.
  • Figure 3: Left: The states and collision geometry of the simulation model used in the experiments. Right: The autonomous vehicle platform and barrel equipped with sensors for state estimation and compute for running our algorithm.
  • Figure 4: For a grid of $(x,y)$ initial car positions, we color each point according to the accumulated value of running each algorithm (CEM, CEM-Reuse, UCT, MPT). Purple indicates higher value. These simulations were generated with $L=200$, a planning horizon of $10$, and a simulation depth of $100$. The value shown is averaged across ten runs at each initial condition. Our proposed method, mpt, provides the best average cumulative reward of all methods, with a significant improvement over the baseline uct method.
  • Figure 5: The value of the trajectory produced by each planning method versus the number of simulations. For each $L$, $100$ trials are run, and the average value is plotted with one standard deviation error bar. Our proposed algorithm (mpt) significantly outperforms the baselines and has a less noisy estimate.
  • Figure 6: Our algorithm plans for and executes a solution onboard an autonomous vehicle testbed. I: The trajectory of the vehicle (solid line) and the pushed object (dashed line) are shown, with the planned trajectory in red and the actual trajectory in blue. The tree resets are circled in red. II: An overlay of the trajectory of the vehicle and barrel over the course of the experiment in the Caltech Center for Autonomous Systems and Technologies. III: The states as simulated by the tree (blue) and as measured by the motion capture (orange). The tree reset instances are each shown as a pair of red lines. IV: The number of simulations saved by reusing the tree is shown. On average, one third of the new simulations are carried over to the next planning iteration. Near the end of the experiment, when the optimal behavior is easy to find, the number of reused simulations rises dramatically. The decision trees grown here are highly concentrated to the optimal actions at each depth.

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Remark 1
  • Lemma 1
  • proof
  • Theorem 4
  • proof