A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

Andrea Ghezzi; Rudolf Reiter; Katrin Baumgärtner; Alberto Bemporad; Moritz Diehl

A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

Andrea Ghezzi, Rudolf Reiter, Katrin Baumgärtner, Alberto Bemporad, Moritz Diehl

TL;DR

This work addresses fast, high-performance MPC-by-RL control by introducing Policy-Enhanced Partial Tightening (PEPT). PEPT constructs a convex quadratic terminal cost $\bar{V}_f$ via a Riccati recursion around a trajectory produced by a trained RL policy, and embeds this terminal cost into a two-phase MPC solved in real time. The approach enables fast online computation while improving constraint satisfaction and tracking accuracy, as demonstrated on quadcopter trajectory tracking with bounded states and inputs; two initialization strategies for the second phase provide a trade-off between reliance on the RL policy and MPC robustness. Compared with pure RL and several MPC variants, PEPT yields substantial reductions in constraint violations and competitive or faster runtimes, with rollout-based initializations offering further gains in constraint satisfaction. The method is open-source and broadly applicable to other policies beyond RL, offering a practical path to combining learning and optimization in real-time control.

Abstract

We propose a novel approach for combining model predictive control (MPC) with reinforcement learning (RL) to reduce online computation while achieving high closed-loop tracking performance and constraint satisfaction. This method, called Policy-Enhanced Partial Tightening (PEPT), approximates the optimal value function through a Riccati recursion around a state-control trajectory obtained by evaluating the RL policy. The result is a convex quadratic terminal cost that can be seamlessly integrated into the MPC formulation. The proposed controller is tested in simulations on a trajectory tracking problem for a quadcopter with nonlinear dynamics and bounded state and control. The results highlight PEPT's effectiveness, outperforming both pure RL policies and several MPC variations. Compared to pure RL, PEPT achieves 1000 times lower constraint violation cost with only twice the feedback time. Against the best MPC-based policy, PEPT reduces constraint violations by 2 to 5 times and runs nearly 3 times faster while maintaining similar tracking performance. The code is open-source at www.github.com/aghezz1/pept.

A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

TL;DR

This work addresses fast, high-performance MPC-by-RL control by introducing Policy-Enhanced Partial Tightening (PEPT). PEPT constructs a convex quadratic terminal cost

via a Riccati recursion around a trajectory produced by a trained RL policy, and embeds this terminal cost into a two-phase MPC solved in real time. The approach enables fast online computation while improving constraint satisfaction and tracking accuracy, as demonstrated on quadcopter trajectory tracking with bounded states and inputs; two initialization strategies for the second phase provide a trade-off between reliance on the RL policy and MPC robustness. Compared with pure RL and several MPC variants, PEPT yields substantial reductions in constraint violations and competitive or faster runtimes, with rollout-based initializations offering further gains in constraint satisfaction. The method is open-source and broadly applicable to other policies beyond RL, offering a practical path to combining learning and optimization in real-time control.

A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

TL;DR

Abstract

A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)

Theorems & Definitions (3)