The Trajectory Bundle Method: Unifying Sequential-Convex Programming and Sampling-Based Trajectory Optimization
Kevin Tracy, John Z. Zhang, Jon Arrizabalaga, Stefan Schaal, Yuval Tassa, Tom Erez, Zachary Manchester
TL;DR
The paper tackles trajectory optimization when derivatives are unavailable or expensive by introducing the Trajectory Bundle Method (TBM), a derivative-free sequential convex programming framework that forms convex surrogates through sampling and linear interpolation of cost, dynamics, and constraints. TBM unifies SCP and MPPI by showing MPPI as a special case with entropy-regularized updates and extends to general multiple shooting, long-horizon planning, and black-box dynamics via parallel simulations. Concretely, TBM constructs sample-based bundles around the current iterates, then solves a convex problem with affine interpolants and simplex-based interpolation weights $\alpha \in \Delta^{m-1}$ to update the trajectory, with MPPI recovered in the single-shooting, entropy-regularized limit using $\alpha_i = \frac{e^{-{J_i}/{\lambda}}}{\sum_j e^{-{J_j}/{\lambda}}}$. The approach is validated on diverse robotics problems including collision avoidance, quadrotor tracking, neural-dynamics cartpole swingup, and a 1:43 race car min-time task, demonstrating strong constraint satisfaction, competitive or superior performance to gradient-based methods, and clear advantages in parallelizable simulation settings.
Abstract
We present a unified framework for solving trajectory optimization problems in a derivative-free manner through the use of sequential convex programming. Traditionally, nonconvex optimization problems are solved by forming and solving a sequence of convex optimization problems, where the cost and constraint functions are approximated locally through Taylor series expansions. This presents a challenge for functions where differentiation is expensive or unavailable. In this work, we present a derivative-free approach to form these convex approximations by computing samples of the dynamics, cost, and constraint functions and letting the solver interpolate between them. Our framework includes sample-based trajectory optimization techniques like model-predictive path integral (MPPI) control as a special case and generalizes them to enable features like multiple shooting and general equality and inequality constraints that are traditionally associated with derivative-based sequential convex programming methods. The resulting framework is simple, flexible, and capable of solving a wide variety of practical motion planning and control problems.
