A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy
Andrea Ghezzi, Rudolf Reiter, Katrin Baumgärtner, Alberto Bemporad, Moritz Diehl
TL;DR
This work addresses fast, high-performance MPC-by-RL control by introducing Policy-Enhanced Partial Tightening (PEPT). PEPT constructs a convex quadratic terminal cost $\bar{V}_f$ via a Riccati recursion around a trajectory produced by a trained RL policy, and embeds this terminal cost into a two-phase MPC solved in real time. The approach enables fast online computation while improving constraint satisfaction and tracking accuracy, as demonstrated on quadcopter trajectory tracking with bounded states and inputs; two initialization strategies for the second phase provide a trade-off between reliance on the RL policy and MPC robustness. Compared with pure RL and several MPC variants, PEPT yields substantial reductions in constraint violations and competitive or faster runtimes, with rollout-based initializations offering further gains in constraint satisfaction. The method is open-source and broadly applicable to other policies beyond RL, offering a practical path to combining learning and optimization in real-time control.
Abstract
We propose a novel approach for combining model predictive control (MPC) with reinforcement learning (RL) to reduce online computation while achieving high closed-loop tracking performance and constraint satisfaction. This method, called Policy-Enhanced Partial Tightening (PEPT), approximates the optimal value function through a Riccati recursion around a state-control trajectory obtained by evaluating the RL policy. The result is a convex quadratic terminal cost that can be seamlessly integrated into the MPC formulation. The proposed controller is tested in simulations on a trajectory tracking problem for a quadcopter with nonlinear dynamics and bounded state and control. The results highlight PEPT's effectiveness, outperforming both pure RL policies and several MPC variations. Compared to pure RL, PEPT achieves 1000 times lower constraint violation cost with only twice the feedback time. Against the best MPC-based policy, PEPT reduces constraint violations by 2 to 5 times and runs nearly 3 times faster while maintaining similar tracking performance. The code is open-source at www.github.com/aghezz1/pept.
