Table of Contents
Fetching ...

CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control

Se Hwan Jeon, Seungwoo Hong, Ho Jae Lee, Charles Khazoom, Sangbae Kim

TL;DR

The paper tackles the bottleneck of integrating model-based optimization into large-scale reinforcement learning by enabling GPU-based parallelization of symbolic expressions. It introduces CusADi, an extension of the CasADi framework that auto-generates CUDA kernels to evaluate arbitrary symbolic expressions in parallel across thousands of environments, and formulates a closed-form, fixed-iteration approximation to the optimal control problem to enable scalable MPC. The authors provide code generation and a PyTorch interface, demonstrate speedups up to 1000x over serial CPU and substantial gains when data remains on-device, and validate the approach through MIT Humanoid MPC, centroidal momentum augmentation in RL, and parallelized quadcopter rollouts. The work offers a practical pathway to integrate model-based optimization into RL pipelines, enabling rapid parallel simulations, parameter sweeps, and policy training with large batch sizes.

Abstract

The parallelism afforded by GPUs presents significant advantages in training controllers through reinforcement learning (RL). However, integrating model-based optimization into this process remains challenging due to the complexity of formulating and solving optimization problems across thousands of instances. In this work, we present CusADi, an extension of the CasADi symbolic framework to support the parallelization of arbitrary closed-form expressions on GPUs with CUDA. We also formulate a closed-form approximation for solving general optimal control problems, enabling large-scale parallelization and evaluation of MPC controllers. Our results show a ten-fold speedup relative to similar MPC implementation on the CPU, and we demonstrate the use of CusADi for various applications, including parallel simulation, parameter sweeps, and policy training.

CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control

TL;DR

The paper tackles the bottleneck of integrating model-based optimization into large-scale reinforcement learning by enabling GPU-based parallelization of symbolic expressions. It introduces CusADi, an extension of the CasADi framework that auto-generates CUDA kernels to evaluate arbitrary symbolic expressions in parallel across thousands of environments, and formulates a closed-form, fixed-iteration approximation to the optimal control problem to enable scalable MPC. The authors provide code generation and a PyTorch interface, demonstrate speedups up to 1000x over serial CPU and substantial gains when data remains on-device, and validate the approach through MIT Humanoid MPC, centroidal momentum augmentation in RL, and parallelized quadcopter rollouts. The work offers a practical pathway to integrate model-based optimization into RL pipelines, enabling rapid parallel simulations, parameter sweeps, and policy training with large batch sizes.

Abstract

The parallelism afforded by GPUs presents significant advantages in training controllers through reinforcement learning (RL). However, integrating model-based optimization into this process remains challenging due to the complexity of formulating and solving optimization problems across thousands of instances. In this work, we present CusADi, an extension of the CasADi symbolic framework to support the parallelization of arbitrary closed-form expressions on GPUs with CUDA. We also formulate a closed-form approximation for solving general optimal control problems, enabling large-scale parallelization and evaluation of MPC controllers. Our results show a ten-fold speedup relative to similar MPC implementation on the CPU, and we demonstrate the use of CusADi for various applications, including parallel simulation, parameter sweeps, and policy training.
Paper Structure (15 sections, 7 equations, 7 figures, 1 table)

This paper contains 15 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Parallelizing MPC for the MIT Humanoid across thousands of environments in NVIDIA's IsaacGym Makoviychuk2021_isaacgym. The predicted positions of the base are shown as blue spheres.
  • Figure 2: Visualization of CusADi parallelization. Left: Symbolic expressions in casadi are represented as expression graphs, a sequence of atomic operations ($i_1, i_2, i_3$) which evaluate the function. Right: Each atomic operation in the sequence can be vectorized to act on an arbitrary number of elements with CUDA; by repeating this for all operations in the original expression, casadi symbolic expressions can be evaluated for thousands of instances in parallel on the GPU.
  • Figure 3: Left: Relative speed compared to serial CPU evaluation. The complexity of the function significantly affects the potential speedups from the GPU. Right: Relative speed compared to serial CPU evaluation with data transfer overhead. Copying memory between host and memory devices has a substantial effect on speed that is emphasized at larger batch sizes.
  • Figure 4: Pareto curve of closed-loop cost and constraint violation vs. evaluation time in closed-loop simulation for a single environment. The "ground truth" for the QP is computed with ProxQPBambade2023_ProxQP, and the grey area represents when the controller is no longer closed-loop stable in IsaacGym.
  • Figure 5: Left: An example of how tracking centroidal angular momentum can generate natural behavior from the legs and arms for a humanoid robot Wensing2016_centroidal. Right: Using CusADi, we rewarded tracking a desired centroidal angular momentum based on Wensing2016_centroidal, instead of a desired base angular velocity. Emergent arm swing is noticeable. We also use CusADi to visualize the centers of mass (blue), composite inertia (pink), angular momentum (green) and linear momentum (red).
  • ...and 2 more figures