Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids
Caio Fabio Oliveira da Silva, Azita Dabiri, Bart De Schutter
TL;DR
The paper tackles the real-time control challenge of hybrid systems modeled as mixed-logical dynamical (MLD) systems, where standard model predictive control yields computationally intensive mixed-integer linear programs. It introduces a novel integration of reinforcement learning (RL) with model predictive control (MPC) to decouple discrete and continuous decisions: RL selects the discrete actions, converting the MILP into a linear program (LP) for the continuous actions, thereby drastically reducing online computation. A decoupled Q-function, approximated by an LSTM, enables tractable learning over the horizon, with offline training and online LP inference. The proposed framework is validated on a microgrid economic dispatch problem, showing substantial online speedups (up to 16x) while maintaining high feasibility and sub-1% suboptimality, and revealing a trade-off where RL improves feasibility compared to a supervised-learning baseline. The work advances real-time control of hybrid energy systems and suggests extensions to more complex MILP/MILP-like problems and nonlinear regimes.
Abstract
This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. Our approach aims to mitigate this issue by decoupling the decision on the discrete variables from the decision on the continuous variables. In the proposed approach, reinforcement learning determines the discrete decision variables and simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computational time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. Simulation experiments on a microgrid system using real-world data demonstrate that the proposed method substantially reduces the online computation time of MPC while maintaining high feasibility and low suboptimality.
