Table of Contents
Fetching ...

Differentiable Model Predictive Control on the GPU

Emre Adabag, Marcus Greiff, John Subosits, Thomas Lew

TL;DR

This paper presents DiffMPC, a GPU-accelerated differentiable MPC solver that combines SQP for the forward OCP solve with a preconditioned conjugate gradient (PCG) method using a tridiagonal preconditioner to exploit time-sequential structure. By reusing a precomputed KKT matrix and leveraging parallelism over time and problem instances, DiffMPC achieves substantial speedups (often >4×) over existing CPU and GPU baselines in reinforcement learning and imitation learning tasks and scales to large batch training. The backward pass uses the implicit function theorem to compute sensitivities with respect to problem parameters, enabling end-to-end differentiable policies and learning of cost/constraint parameters, including domain-randomized dynamics for driving at the limits. The method is demonstrated on driving scenarios with domain randomization, showing improved robustness and transfer to real-vehicle drifting tasks, while also detailing limitations and avenues for future improvements, such as handling inequality constraints more thoroughly and CPU-side performance enhancements.

Abstract

Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.

Differentiable Model Predictive Control on the GPU

TL;DR

This paper presents DiffMPC, a GPU-accelerated differentiable MPC solver that combines SQP for the forward OCP solve with a preconditioned conjugate gradient (PCG) method using a tridiagonal preconditioner to exploit time-sequential structure. By reusing a precomputed KKT matrix and leveraging parallelism over time and problem instances, DiffMPC achieves substantial speedups (often >4×) over existing CPU and GPU baselines in reinforcement learning and imitation learning tasks and scales to large batch training. The backward pass uses the implicit function theorem to compute sensitivities with respect to problem parameters, enabling end-to-end differentiable policies and learning of cost/constraint parameters, including domain-randomized dynamics for driving at the limits. The method is demonstrated on driving scenarios with domain randomization, showing improved robustness and transfer to real-vehicle drifting tasks, while also detailing limitations and avenues for future improvements, such as handling inequality constraints more thoroughly and CPU-side performance enhancements.

Abstract

Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.

Paper Structure

This paper contains 23 sections, 30 equations, 10 figures, 6 tables, 4 algorithms.

Figures (10)

  • Figure 1: DO solves optimization problems in $z$ (the forward pass) and computes sensitivities with respect to parameters $\theta$ (the backward pass), moving along the loss surface $\ell$ defined by the optimization problem.
  • Figure 2: DiffMPC architecture: forward and backward passes, data flows, and main steps.
  • Figure 3: RL computation times on one of the test problems. Error bars indicate 2$\sigma$ confidence intervals. Each backward pass also includes one forward pass (to evaluate the inputs of Algorithm \ref{['alg:backward']}).
  • Figure 4: Losses over 200 epochs for the pendulum cart-pole IL benchmark.
  • Figure 5: Proposed RL training pipeline to robustify an MPC policy $\pi^{\theta}(x)$ for drifting.
  • ...and 5 more figures