Table of Contents
Fetching ...

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

TL;DR

This paper proposes a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning (RL), with the goal of simultaneously optimizing the control performance and the power usage of thecontrol algorithm.

Abstract

Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper, we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPCwith RL, i.e. parameters affecting the structure of the MPCproblem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixture-distribution policy and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

TL;DR

This paper proposes a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning (RL), with the goal of simultaneously optimizing the control performance and the power usage of thecontrol algorithm.

Abstract

Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper, we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPCwith RL, i.e. parameters affecting the structure of the MPCproblem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixture-distribution policy and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.

Paper Structure

This paper contains 26 sections, 2 theorems, 37 equations, 4 figures, 3 tables.

Key Result

Proposition 1

The state $s$ has the Markov property, i.e. $P(s_{t + 1} | s_t) = P(s_{t+1} | s_{0:t})$

Figures (4)

  • Figure 1: An overview of the control algorithm. Not shown here is the connection of each policy's output to the algorithm that updates the policy's parameters.
  • Figure 2: Total cost over the evaluation set for the MPC as a function of fixed horizons and fixed recomputation schedules. The minimum is found at horizon $N=31$ and recompute every step with a cost of $781$.
  • Figure 3: Distribution of prediction horizons and steps between computations selected by the best performing policy on the evaluation test-set.
  • Figure 4: Training process for the proposed learned control algorithm, and for each meta-parameter in isolation. In the isolated cases, we show that the tuning process is capable of recovering from sub-optimal initializations.

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof