Quattro: Transformer-Accelerated Iterative Linear Quadratic Regulator Framework for Fast Trajectory Optimization
Yue Wang, Haoyu Wang, Zhaoxing Li
TL;DR
Quattro introduces Transformer-accelerated $iLQR$ to mitigate the sequential bottleneck in real-time trajectory optimization. By predicting intermediate feedback and feedforward gains with a decoder-only Transformer and deploying a specialized FPGA accelerator, it enables parallel computation and substantial reductions in latency within an MPC framework. The approach achieves up to 27x per-iteration speedups (and up to 17.8x end-to-end MPC acceleration) on cart-pole and quadrotor tasks, with notable power savings on edge hardware. The work demonstrates the viability of Transformer-based acceleration for real-time optimal control and presents a lightweight configuration that balances accuracy, speed, and hardware practicality, with clear avenues for broader robotic applications and learning-based robustness.
Abstract
Real-time optimal control remains a fundamental challenge in robotics, especially for nonlinear systems with stringent performance requirements. As one of the representative trajectory optimization algorithms, the iterative Linear Quadratic Regulator (iLQR) faces limitations due to their inherently sequential computational nature, which restricts the efficiency and applicability of real-time control for robotic systems. While existing parallel implementations aim to overcome the above limitations, they typically demand additional computational iterations and high-performance hardware, leading to only modest practical improvements. In this paper, we introduce Quattro, a transformer-accelerated iLQR framework employing an algorithm-hardware co-design strategy to predict intermediate feedback and feedforward matrices. It facilitates effective parallel computations on resource-constrained devices without sacrificing accuracy. Experiments on cart-pole and quadrotor systems show an algorithm-level acceleration of up to 5.3$\times$ and 27$\times$ per iteration, respectively. When integrated into a Model Predictive Control (MPC) framework, Quattro achieves overall speedups of 2.8$\times$ for the cart-pole and 17.8$\times$ for the quadrotor compared to the one that applies traditional iLQR. Transformer inference is deployed on FPGA to maximize performance, achieving further up to 20.8$\times$ speedup over prevalent embedded CPUs with over 11$\times$ power reduction than GPU and low hardware resource overhead.
