Table of Contents
Fetching ...

A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

Andrea Ghezzi, Rudolf Reiter, Katrin Baumgärtner, Alberto Bemporad, Moritz Diehl

TL;DR

This work addresses fast, high-performance MPC-by-RL control by introducing Policy-Enhanced Partial Tightening (PEPT). PEPT constructs a convex quadratic terminal cost $\bar{V}_f$ via a Riccati recursion around a trajectory produced by a trained RL policy, and embeds this terminal cost into a two-phase MPC solved in real time. The approach enables fast online computation while improving constraint satisfaction and tracking accuracy, as demonstrated on quadcopter trajectory tracking with bounded states and inputs; two initialization strategies for the second phase provide a trade-off between reliance on the RL policy and MPC robustness. Compared with pure RL and several MPC variants, PEPT yields substantial reductions in constraint violations and competitive or faster runtimes, with rollout-based initializations offering further gains in constraint satisfaction. The method is open-source and broadly applicable to other policies beyond RL, offering a practical path to combining learning and optimization in real-time control.

Abstract

We propose a novel approach for combining model predictive control (MPC) with reinforcement learning (RL) to reduce online computation while achieving high closed-loop tracking performance and constraint satisfaction. This method, called Policy-Enhanced Partial Tightening (PEPT), approximates the optimal value function through a Riccati recursion around a state-control trajectory obtained by evaluating the RL policy. The result is a convex quadratic terminal cost that can be seamlessly integrated into the MPC formulation. The proposed controller is tested in simulations on a trajectory tracking problem for a quadcopter with nonlinear dynamics and bounded state and control. The results highlight PEPT's effectiveness, outperforming both pure RL policies and several MPC variations. Compared to pure RL, PEPT achieves 1000 times lower constraint violation cost with only twice the feedback time. Against the best MPC-based policy, PEPT reduces constraint violations by 2 to 5 times and runs nearly 3 times faster while maintaining similar tracking performance. The code is open-source at www.github.com/aghezz1/pept.

A Numerically Efficient Method to Enhance Model Predictive Control Performance with a Reinforcement Learning Policy

TL;DR

This work addresses fast, high-performance MPC-by-RL control by introducing Policy-Enhanced Partial Tightening (PEPT). PEPT constructs a convex quadratic terminal cost via a Riccati recursion around a trajectory produced by a trained RL policy, and embeds this terminal cost into a two-phase MPC solved in real time. The approach enables fast online computation while improving constraint satisfaction and tracking accuracy, as demonstrated on quadcopter trajectory tracking with bounded states and inputs; two initialization strategies for the second phase provide a trade-off between reliance on the RL policy and MPC robustness. Compared with pure RL and several MPC variants, PEPT yields substantial reductions in constraint violations and competitive or faster runtimes, with rollout-based initializations offering further gains in constraint satisfaction. The method is open-source and broadly applicable to other policies beyond RL, offering a practical path to combining learning and optimization in real-time control.

Abstract

We propose a novel approach for combining model predictive control (MPC) with reinforcement learning (RL) to reduce online computation while achieving high closed-loop tracking performance and constraint satisfaction. This method, called Policy-Enhanced Partial Tightening (PEPT), approximates the optimal value function through a Riccati recursion around a state-control trajectory obtained by evaluating the RL policy. The result is a convex quadratic terminal cost that can be seamlessly integrated into the MPC formulation. The proposed controller is tested in simulations on a trajectory tracking problem for a quadcopter with nonlinear dynamics and bounded state and control. The results highlight PEPT's effectiveness, outperforming both pure RL policies and several MPC variations. Compared to pure RL, PEPT achieves 1000 times lower constraint violation cost with only twice the feedback time. Against the best MPC-based policy, PEPT reduces constraint violations by 2 to 5 times and runs nearly 3 times faster while maintaining similar tracking performance. The code is open-source at www.github.com/aghezz1/pept.

Paper Structure

This paper contains 17 sections, 14 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Position and trajectory in the $xz$-plane obtained with different controllers for the easy task where the lemniscate is scaled with $\alpha=0.8$. The black dotted lines are the reference, while the dashed red lines the bounds on velocity.
  • Figure 2: Easy task - Breakdown of the average closed-loop cost for the considered approaches achieved in 100 episodes.
  • Figure 3: Hard task - Breakdown of the average closed-loop cost for the considered approaches achieved in 100 episodes.

Theorems & Definitions (3)

  • Remark 1
  • Remark 2
  • Remark 3