Table of Contents
Fetching ...

The Trajectory Bundle Method: Unifying Sequential-Convex Programming and Sampling-Based Trajectory Optimization

Kevin Tracy, John Z. Zhang, Jon Arrizabalaga, Stefan Schaal, Yuval Tassa, Tom Erez, Zachary Manchester

TL;DR

The paper tackles trajectory optimization when derivatives are unavailable or expensive by introducing the Trajectory Bundle Method (TBM), a derivative-free sequential convex programming framework that forms convex surrogates through sampling and linear interpolation of cost, dynamics, and constraints. TBM unifies SCP and MPPI by showing MPPI as a special case with entropy-regularized updates and extends to general multiple shooting, long-horizon planning, and black-box dynamics via parallel simulations. Concretely, TBM constructs sample-based bundles around the current iterates, then solves a convex problem with affine interpolants and simplex-based interpolation weights $\alpha \in \Delta^{m-1}$ to update the trajectory, with MPPI recovered in the single-shooting, entropy-regularized limit using $\alpha_i = \frac{e^{-{J_i}/{\lambda}}}{\sum_j e^{-{J_j}/{\lambda}}}$. The approach is validated on diverse robotics problems including collision avoidance, quadrotor tracking, neural-dynamics cartpole swingup, and a 1:43 race car min-time task, demonstrating strong constraint satisfaction, competitive or superior performance to gradient-based methods, and clear advantages in parallelizable simulation settings.

Abstract

We present a unified framework for solving trajectory optimization problems in a derivative-free manner through the use of sequential convex programming. Traditionally, nonconvex optimization problems are solved by forming and solving a sequence of convex optimization problems, where the cost and constraint functions are approximated locally through Taylor series expansions. This presents a challenge for functions where differentiation is expensive or unavailable. In this work, we present a derivative-free approach to form these convex approximations by computing samples of the dynamics, cost, and constraint functions and letting the solver interpolate between them. Our framework includes sample-based trajectory optimization techniques like model-predictive path integral (MPPI) control as a special case and generalizes them to enable features like multiple shooting and general equality and inequality constraints that are traditionally associated with derivative-based sequential convex programming methods. The resulting framework is simple, flexible, and capable of solving a wide variety of practical motion planning and control problems.

The Trajectory Bundle Method: Unifying Sequential-Convex Programming and Sampling-Based Trajectory Optimization

TL;DR

The paper tackles trajectory optimization when derivatives are unavailable or expensive by introducing the Trajectory Bundle Method (TBM), a derivative-free sequential convex programming framework that forms convex surrogates through sampling and linear interpolation of cost, dynamics, and constraints. TBM unifies SCP and MPPI by showing MPPI as a special case with entropy-regularized updates and extends to general multiple shooting, long-horizon planning, and black-box dynamics via parallel simulations. Concretely, TBM constructs sample-based bundles around the current iterates, then solves a convex problem with affine interpolants and simplex-based interpolation weights to update the trajectory, with MPPI recovered in the single-shooting, entropy-regularized limit using . The approach is validated on diverse robotics problems including collision avoidance, quadrotor tracking, neural-dynamics cartpole swingup, and a 1:43 race car min-time task, demonstrating strong constraint satisfaction, competitive or superior performance to gradient-based methods, and clear advantages in parallelizable simulation settings.

Abstract

We present a unified framework for solving trajectory optimization problems in a derivative-free manner through the use of sequential convex programming. Traditionally, nonconvex optimization problems are solved by forming and solving a sequence of convex optimization problems, where the cost and constraint functions are approximated locally through Taylor series expansions. This presents a challenge for functions where differentiation is expensive or unavailable. In this work, we present a derivative-free approach to form these convex approximations by computing samples of the dynamics, cost, and constraint functions and letting the solver interpolate between them. Our framework includes sample-based trajectory optimization techniques like model-predictive path integral (MPPI) control as a special case and generalizes them to enable features like multiple shooting and general equality and inequality constraints that are traditionally associated with derivative-based sequential convex programming methods. The resulting framework is simple, flexible, and capable of solving a wide variety of practical motion planning and control problems.

Paper Structure

This paper contains 24 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The Trajectory Bundle Method (TBM) is a derivative-free framework capable of solving both single-shooting sampling-based MPC problems and general multiple-shooting trajectory optimization problems. TBM finds the optimal state and control sequences (bolded black trajectory) by computing samples (colored points) at each knot point ($t$s) around the current solution and using convex optimization to linearly interpolate between the samples (dark gray regions).
  • Figure 2: A comparison of the accuracy of a first-order Taylor series taken about $(x,y)=(1,1)$ with linear interpolation of the four corner points on the function $f(x,y) = \sin(x)e^{y}$. While the magnitudes of the errors are comparable between these two approximations, the patterns of these errors are notably different.
  • Figure 3: A double integrator with acceleration control is tasked with navigating around three obstacles to a goal position. The trajectory bundle method can directly reason about these nonlinear, non-convex constraints without derivatives, with strong constraint satisfaction and optimality achieved in fewer than 40 iterations.
  • Figure 4: A quadrotor with rotor-velocity control tracks a figure eight reference over a $5$-second horizon discretized with $100$ time steps. The Trajectory Bundle Method solves for an aggressive and smooth trajectory (left) while MPPI fails (right).
  • Figure 5: The cartpole swingup task with a neural-network dynamics model. TBM (ours) finds smooth state (top left) and control (top right) trajectories and converges to tight ($10^{-6}$, red dashed line) constraint satisfaction (bottom left). The baseline IPOPT solver fails to converge (bottom right).
  • ...and 2 more figures