Table of Contents
Fetching ...

Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids

Caio Fabio Oliveira da Silva, Azita Dabiri, Bart De Schutter

TL;DR

The paper tackles the real-time control challenge of hybrid systems modeled as mixed-logical dynamical (MLD) systems, where standard model predictive control yields computationally intensive mixed-integer linear programs. It introduces a novel integration of reinforcement learning (RL) with model predictive control (MPC) to decouple discrete and continuous decisions: RL selects the discrete actions, converting the MILP into a linear program (LP) for the continuous actions, thereby drastically reducing online computation. A decoupled Q-function, approximated by an LSTM, enables tractable learning over the horizon, with offline training and online LP inference. The proposed framework is validated on a microgrid economic dispatch problem, showing substantial online speedups (up to 16x) while maintaining high feasibility and sub-1% suboptimality, and revealing a trade-off where RL improves feasibility compared to a supervised-learning baseline. The work advances real-time control of hybrid energy systems and suggests extensions to more complex MILP/MILP-like problems and nonlinear regimes.

Abstract

This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. Our approach aims to mitigate this issue by decoupling the decision on the discrete variables from the decision on the continuous variables. In the proposed approach, reinforcement learning determines the discrete decision variables and simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computational time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. Simulation experiments on a microgrid system using real-world data demonstrate that the proposed method substantially reduces the online computation time of MPC while maintaining high feasibility and low suboptimality.

Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids

TL;DR

The paper tackles the real-time control challenge of hybrid systems modeled as mixed-logical dynamical (MLD) systems, where standard model predictive control yields computationally intensive mixed-integer linear programs. It introduces a novel integration of reinforcement learning (RL) with model predictive control (MPC) to decouple discrete and continuous decisions: RL selects the discrete actions, converting the MILP into a linear program (LP) for the continuous actions, thereby drastically reducing online computation. A decoupled Q-function, approximated by an LSTM, enables tractable learning over the horizon, with offline training and online LP inference. The proposed framework is validated on a microgrid economic dispatch problem, showing substantial online speedups (up to 16x) while maintaining high feasibility and sub-1% suboptimality, and revealing a trade-off where RL improves feasibility compared to a supervised-learning baseline. The work advances real-time control of hybrid energy systems and suggests extensions to more complex MILP/MILP-like problems and nonlinear regimes.

Abstract

This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. Our approach aims to mitigate this issue by decoupling the decision on the discrete variables from the decision on the continuous variables. In the proposed approach, reinforcement learning determines the discrete decision variables and simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computational time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. Simulation experiments on a microgrid system using real-world data demonstrate that the proposed method substantially reduces the online computation time of MPC while maintaining high feasibility and low suboptimality.
Paper Structure (23 sections, 29 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 23 sections, 29 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: Representation of the decoupling of the discrete and continuous decision variables. From mixed logical dynamical (MLD) modeling and the use of an MPC approach for control, a mixed-integer linear program (MILP) can be formulated for the operation of the microgrid. Then, a learning approach—either reinforcement learning (RL) or supervised learning (SL) -- is used to determine the discrete variable $\epsilon_\mathrm{d}(k)$. The MILP is then simplified into a linear program (LP), which computes the continuous variable $\epsilon_\mathrm{c}(k)$.
  • Figure 2: A depiction of the proposed control scheme that integrates reinforcement learning into an MPC framework. The agent's goal is to maximize its long-term reward. It learns to adapt its policy by repeatedly interacting with the environment, that is, by sending a discrete action $\epsilon_\mathrm{d}(k)$ and by receiving the extended state $\chi$ and immediate reward $r$. The MPC controller, which is lumped in the environment, receives this discrete action $\epsilon_\mathrm{d}(k)$ and then solves an optimization problem to determine the continuous action $\epsilon_\mathrm{c}(k)$. Finally, the input $\epsilon$ is fed to the system, and the next state is computed.
  • Figure 3: A representation of the recurrent LSTM network on the left-hand side and the unrolled LSTM network on the right-hand side. Note that the LSTM is unrolled for the duration of the prediction horizon $N_\mathrm{p}$. Moreover, at each time step $k$, the augmented state $\chi(k)$ can go through the operation $g_k(\cdot)$, changing the manner in which the augmented state is presented to the LSTM network at time step $k$. This can be interpreted as a preprocessing technique to better exploit the structure of our problem.
  • Figure 4: Depiction of the elements of a microgrid and a bidirectional connection with the main grid. The pointed arrows indicate the possibility of power flow between two elements.
  • Figure 5: Optimality gap for different prediction horizons for the reinforcement learning (RL) and supervised learning (SL) approaches in the case study.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Remark