Table of Contents
Fetching ...

On Architectures for Combining Reinforcement Learning and Model Predictive Control with Runtime Improvements

Xiaolong Jia, Nikhil Bajaj

TL;DR

The paper tackles MPC's computational burden and sensitivity to model mismatch by proposing two RL-augmented architectures: Warm Start RL, which initializes an RL actor with a pre-trained NNMPC, and RL + MPC, which uses RL to generate corrective residuals atop NNMPC outputs. It further introduces DRMPC, a downsampling strategy for reference trajectories to shrink NN input size without sacrificing accuracy. Evaluations on a rotary inverted pendulum show runtime reductions exceeding 99% and notable cost improvements (11–40%) under parameter variations, with DRMPC closely matching MPC performance in simulation. Real-world training of RLMPC demonstrates practical feasibility under safety constraints, though formal guarantees for stability and robustness remain a topic for future work. Overall, the work provides viable pathways to combine MPC and RL for faster, more robust control in the presence of model uncertainties.

Abstract

Model Predictive Control (MPC) faces computational demands and performance degradation from model inaccuracies. We propose two architectures combining Neural Network-approximated MPC (NNMPC) with Reinforcement Learning (RL). The first, Warm Start RL, initializes the RL actor with pre-trained NNMPC weights. The second, RLMPC, uses RL to generate corrective residuals for NNMPC outputs. We introduce a downsampling method reducing NNMPC input dimensions while maintaining performance. Evaluated on a rotary inverted pendulum, both architectures demonstrate runtime reductions exceeding 99% compared to traditional MPC while improving tracking performance under model uncertainties, with RL+MPC achieving 11-40% cost reduction depending on reference amplitude.

On Architectures for Combining Reinforcement Learning and Model Predictive Control with Runtime Improvements

TL;DR

The paper tackles MPC's computational burden and sensitivity to model mismatch by proposing two RL-augmented architectures: Warm Start RL, which initializes an RL actor with a pre-trained NNMPC, and RL + MPC, which uses RL to generate corrective residuals atop NNMPC outputs. It further introduces DRMPC, a downsampling strategy for reference trajectories to shrink NN input size without sacrificing accuracy. Evaluations on a rotary inverted pendulum show runtime reductions exceeding 99% and notable cost improvements (11–40%) under parameter variations, with DRMPC closely matching MPC performance in simulation. Real-world training of RLMPC demonstrates practical feasibility under safety constraints, though formal guarantees for stability and robustness remain a topic for future work. Overall, the work provides viable pathways to combine MPC and RL for faster, more robust control in the presence of model uncertainties.

Abstract

Model Predictive Control (MPC) faces computational demands and performance degradation from model inaccuracies. We propose two architectures combining Neural Network-approximated MPC (NNMPC) with Reinforcement Learning (RL). The first, Warm Start RL, initializes the RL actor with pre-trained NNMPC weights. The second, RLMPC, uses RL to generate corrective residuals for NNMPC outputs. We introduce a downsampling method reducing NNMPC input dimensions while maintaining performance. Evaluated on a rotary inverted pendulum, both architectures demonstrate runtime reductions exceeding 99% compared to traditional MPC while improving tracking performance under model uncertainties, with RL+MPC achieving 11-40% cost reduction depending on reference amplitude.

Paper Structure

This paper contains 14 sections, 13 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: An example of prediction steps processing using the reference "shrinking" downsampling technique. (a) is a sine wave with a period of 0.4*pi and an amplitude of 1. (b) is a square wave with a period of 0.4*pi and an amplitude of 0.5
  • Figure 2: A diagram illustrating the logic of RLMPC. When using approach 1): Warm Start RL, the NNMPC part in the picture will be waived and Actor Network will be initialized as an NNMPC. When using approach 2): RL + MPC, the input for the plant is the addition of MPC output and actor network output.
  • Figure 3: This is a diagram for the defined constraints, (a) for pendulum angle $x_{2}$, (b) for rotary angle $x_{1}$. Where $|x_{2}|<|\alpha_{1}|$ and $|x_{1}|<|\beta_{1}|$ are the reset range, $|x_{2}|<|\alpha_{2}|$ and $|x_{1}|<|\beta_{2}|$ are soft constraints, $|x_{2}|<|\alpha_{3}|$ and $|x_{1}|<|\beta_{3}|$ are hard constraints. Other constraints incorporating velocities $x_{3}$ and $x_{4}$ like energy equation could be introduced to make the training safer.
  • Figure 4: An example plot illustrating the differences between MPC, DRMPC, NNMPC in simulation, where the reference is $sin(t)$ and the initial condition is $\bm{x} = [0,0,0,0]$
  • Figure 5: An example plot illustrating the control performance of MPC, Warm Start RL, RL+MPC in real world control scenario, where the reference trajectory is $\sin(t)$ and the time starts from $7s$ to $14s$ when the pendulum is stabilized
  • ...and 1 more figures