Table of Contents
Fetching ...

Reinforced Model Predictive Control via Trust-Region Quasi-Newton Policy Optimization

Dean Brandner, Sergio Lucia

TL;DR

This work uses a parameterized model predictive controller as policy, and leverages the small amount of necessary parameters to propose a trust-region constrained Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate.

Abstract

Model predictive control can optimally deal with nonlinear systems under consideration of constraints. The control performance depends on the model accuracy and the prediction horizon. Recent advances propose to use reinforcement learning applied to a parameterized model predictive controller to recover the optimal control performance even if an imperfect model or short prediction horizons are used. However, common reinforcement learning algorithms rely on first order updates, which only have a linear convergence rate and hence need an excessive amount of dynamic data. Higher order updates are typically intractable if the policy is approximated with neural networks due to the large number of parameters. In this work, we use a parameterized model predictive controller as policy, and leverage the small amount of necessary parameters to propose a trust-region constrained Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate. We show that the required second order derivative information can be calculated by the solution of a linear system of equations. A simulation study illustrates that the proposed training algorithm outperforms other algorithms in terms of data efficiency and accuracy.

Reinforced Model Predictive Control via Trust-Region Quasi-Newton Policy Optimization

TL;DR

This work uses a parameterized model predictive controller as policy, and leverages the small amount of necessary parameters to propose a trust-region constrained Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate.

Abstract

Model predictive control can optimally deal with nonlinear systems under consideration of constraints. The control performance depends on the model accuracy and the prediction horizon. Recent advances propose to use reinforcement learning applied to a parameterized model predictive controller to recover the optimal control performance even if an imperfect model or short prediction horizons are used. However, common reinforcement learning algorithms rely on first order updates, which only have a linear convergence rate and hence need an excessive amount of dynamic data. Higher order updates are typically intractable if the policy is approximated with neural networks due to the large number of parameters. In this work, we use a parameterized model predictive controller as policy, and leverage the small amount of necessary parameters to propose a trust-region constrained Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate. We show that the required second order derivative information can be calculated by the solution of a linear system of equations. A simulation study illustrates that the proposed training algorithm outperforms other algorithms in terms of data efficiency and accuracy.
Paper Structure (12 sections, 1 theorem, 40 equations, 1 figure, 2 tables, 2 algorithms)

This paper contains 12 sections, 1 theorem, 40 equations, 1 figure, 2 tables, 2 algorithms.

Key Result

Theorem 1

Given the primal-dual solution ${\xi}^*({p})$ of eq:GeneralOP and the differentiated KKT conditions eq:ImplicitFunction_2, the matrix formulation of the second order sensitivities ${S}$ of eq:GeneralOP can be obtained by The right hand side block matrix ${C} \in \mathbb{R}^{n_{{\xi}} \times n_{{p}}^2}$ is composed of submatrices ${C}_j \in \mathbb{R}^{n_{{\xi}} \times n_{{p}}}$, with $j=1,\ldots,

Figures (1)

  • Figure 1: Evolution of the closed-loop cost $J({\theta}_j)$ over the reinforcement learning (RL) iterations $j$. The plots show the results for first order training (left) and second order training (right) with and without a trust region.

Theorems & Definitions (3)

  • Theorem 1: Second Order Sensitivities
  • proof
  • proof