Table of Contents
Fetching ...

GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

Alexander Du, Emre Adabag, Gabriel Bravo, Brian Plancher

TL;DR

GATO presents a GPU-accelerated batched trajectory optimization solver co-designed across algorithms, software, and hardware to deliver real-time throughput for tens-to-hundreds of TO problems in MPC. By formulating and solving a batched Schur-complement system with a GPU-friendly PCG and a symmetric stair preconditioner, GATO achieves up to 18–21× speedups over CPU baselines and 1.4–16× over previous GPU solvers as batch size grows. The approach enables real-time, disturbance-aware planning and control on robotic hardware, demonstrated through simulations and KUKA iiwa experiments, and is released as open-source for reproducibility and adoption. The results show favorable scalability with total knot points $N\cdot M$ and provide practical insights into batch-size sweet spots for accuracy and latency trade-offs.

Abstract

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.

GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

TL;DR

GATO presents a GPU-accelerated batched trajectory optimization solver co-designed across algorithms, software, and hardware to deliver real-time throughput for tens-to-hundreds of TO problems in MPC. By formulating and solving a batched Schur-complement system with a GPU-friendly PCG and a symmetric stair preconditioner, GATO achieves up to 18–21× speedups over CPU baselines and 1.4–16× over previous GPU solvers as batch size grows. The approach enables real-time, disturbance-aware planning and control on robotic hardware, demonstrated through simulations and KUKA iiwa experiments, and is released as open-source for reproducibility and adoption. The results show favorable scalability with total knot points and provide practical insights into batch-size sweet spots for accuracy and latency trade-offs.

Abstract

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.

Paper Structure

This paper contains 16 sections, 8 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: The GATO solver parallelizes across batches of trajectory optimization solves on the GPU through algorithm-software-hardware co-design. This approach enables real-time performance for batch sizes of tens to low-hundreds of solves with tens to low-hundreds of knot points per solve.
  • Figure 2: Overall design of our batched solver which a) forms problems in parallel across solves and timesteps, b) leverages warp-level parallelism within each block-based solve, and c) again leverages large-scale parallelism across the whole GPU for the line search and merit function calculations.
  • Figure 3: (Left) Solve times for 6-DoF manipulator motions while varying the batch size ($M$) and underlying solver. $N=64$ for all solves. GATO shows far improved scalability as compared to state-of-the-art CPU and GPU solutions. (Right) A heat map of solve times while varying both batch size ($M$) and time horizon ($N$). GATO is able to reach kHz control rates for real-time iterations of large batches (512) of short horizon ($N=8$) trajectories, as well as smaller batches (32) of longer horizon trajectories ($N=128$), showing the flexibility of the design.
  • Figure 4: Average (normalized) merit function value across SQP iterations over 100 runs each with 81 different random values for the cost function parameters $Q$ and $R$ in \ref{['eq:trajoptSchur_1']}. For all solves $N=64$, $h=0.05$, $\rho$ ranges from $10^{-8}$ to $10^{1}$.
  • Figure 5: Figure-8 tracking task, with an external disturbance applied at the end effector. (Left) Bar chart shows tracking error, scatter plot shows average total joint velocities. Increasing GATO's batch size enables increased disturbance reject, lowering tracking error and joint velocities until the increased latency from a larger batch size outweighs the optimality gains. (Right) End-effector trajectories realized during this experiment when 50N of external force is applied at the end effector, again showing that modest batch sizes lead to the best performance.
  • ...and 3 more figures