On Architectures for Combining Reinforcement Learning and Model Predictive Control with Runtime Improvements
Xiaolong Jia, Nikhil Bajaj
TL;DR
The paper tackles MPC's computational burden and sensitivity to model mismatch by proposing two RL-augmented architectures: Warm Start RL, which initializes an RL actor with a pre-trained NNMPC, and RL + MPC, which uses RL to generate corrective residuals atop NNMPC outputs. It further introduces DRMPC, a downsampling strategy for reference trajectories to shrink NN input size without sacrificing accuracy. Evaluations on a rotary inverted pendulum show runtime reductions exceeding 99% and notable cost improvements (11–40%) under parameter variations, with DRMPC closely matching MPC performance in simulation. Real-world training of RLMPC demonstrates practical feasibility under safety constraints, though formal guarantees for stability and robustness remain a topic for future work. Overall, the work provides viable pathways to combine MPC and RL for faster, more robust control in the presence of model uncertainties.
Abstract
Model Predictive Control (MPC) faces computational demands and performance degradation from model inaccuracies. We propose two architectures combining Neural Network-approximated MPC (NNMPC) with Reinforcement Learning (RL). The first, Warm Start RL, initializes the RL actor with pre-trained NNMPC weights. The second, RLMPC, uses RL to generate corrective residuals for NNMPC outputs. We introduce a downsampling method reducing NNMPC input dimensions while maintaining performance. Evaluated on a rotary inverted pendulum, both architectures demonstrate runtime reductions exceeding 99% compared to traditional MPC while improving tracking performance under model uncertainties, with RL+MPC achieving 11-40% cost reduction depending on reference amplitude.
