GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
Alexander Du, Emre Adabag, Gabriel Bravo, Brian Plancher
TL;DR
GATO presents a GPU-accelerated batched trajectory optimization solver co-designed across algorithms, software, and hardware to deliver real-time throughput for tens-to-hundreds of TO problems in MPC. By formulating and solving a batched Schur-complement system with a GPU-friendly PCG and a symmetric stair preconditioner, GATO achieves up to 18–21× speedups over CPU baselines and 1.4–16× over previous GPU solvers as batch size grows. The approach enables real-time, disturbance-aware planning and control on robotic hardware, demonstrated through simulations and KUKA iiwa experiments, and is released as open-source for reproducibility and adoption. The results show favorable scalability with total knot points $N\cdot M$ and provide practical insights into batch-size sweet spots for accuracy and latency trade-offs.
Abstract
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
