Cyqlone: A Parallel, High-Performance Linear Solver for Optimal Control
Pieter Pas, Panagiotis Patrinos
TL;DR
Cyqlone introduces a highly parallel, high-performance solver for linear-quadratic OCPs by unifying modified Riccati recursion, parallel Schur-complement techniques, and cyclic reduction. It partitions the horizon across P processors to achieve near-ideal scaling, while leveraging batch-wise vectorization to accelerate small-matrix operations beyond traditional libraries. The companion CyQPALM exploits Cyqlone as the linear solver within a proximal augmented Lagrangian framework, delivering large speedups over state-of-the-art solvers, especially for long horizons and warm-started MPC-like tasks. Together, these methods enable real-time solutions for long-horizon OCPs on modern multi-core hardware and provide open-source implementations for practitioners.
Abstract
We present Cyqlone, a solver for linear systems with a stage-wise optimal control structure that fully exploits the various levels of parallelism available in modern hardware. Cyqlone unifies algorithms based on the sequential Riccati recursion, parallel Schur complement methods, and cyclic reduction methods, thereby minimizing the required number of floating-point operations, while allowing parallelization across a user-configurable number of processors. Given sufficient parallelism, the solver run time scales with the logarithm of the horizon length (in contrast to the linear scaling of sequential Riccati-based methods), enabling real-time solution of long-horizon problems. Beyond multithreading on multi-core processors, implementations of Cyqlone can also leverage vectorization using batched linear algebra routines. Such batched routines exploit data parallelism using single instruction, multiple data (SIMD) operations, and expose a higher degree of instruction-level parallelism than their non-batched counterparts. This enables them to significantly outperform BLAS and BLASFEO for the small matrices that arise in optimal control. Building on this high-performance linear solver, we develop CyQPALM, a parallel and optimal-control-specific variant of the QPALM quadratic programming solver. It combines the parallel and vectorized linear algebra operations from Cyqlone with a parallel line search and parallel factorization updates, resulting in order-of-magnitude speedups compared to the state-of-the-art HPIPM solver. Open-source C++ implementations of Cyqlone and CyQPALM are available at https://github.com/kul-optec/cyqlone
