Table of Contents
Fetching ...

Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing

Seyed Soroush Karimi Madahi, Kenneth Bruninx, Bert Claessens, Chris Develder

TL;DR

The paper tackles implicit balancing in European imbalance markets, where BRPs opportunistically exploit real-time deviations. It proposes an MPC-guided RL framework that uses forecasters within an MPC horizon while enabling minute-level decisions via a two-network neural architecture, trained end-to-end with a distributional RL objective. Key findings show that for a 1 MW battery, the MPC-guided RL yields profit improvements of $16.15\%$ over base RL and $54.36\%$ over stochastic MPC using a realistic forecaster, though gains shrink for larger batteries due to stronger market-model mismatch. The work demonstrates the complementary strengths of MPC (forecast incorporation) and RL (fast, robust inference) and highlights the importance of forecast quality and forecast-confidence inputs. Practically, the approach offers a viable path to increase BRP profitability and enable real-time, forecast-informed balancing actions under EBGL-aligned Belgian market dynamics.

Abstract

In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.

Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing

TL;DR

The paper tackles implicit balancing in European imbalance markets, where BRPs opportunistically exploit real-time deviations. It proposes an MPC-guided RL framework that uses forecasters within an MPC horizon while enabling minute-level decisions via a two-network neural architecture, trained end-to-end with a distributional RL objective. Key findings show that for a 1 MW battery, the MPC-guided RL yields profit improvements of over base RL and over stochastic MPC using a realistic forecaster, though gains shrink for larger batteries due to stronger market-model mismatch. The work demonstrates the complementary strengths of MPC (forecast incorporation) and RL (fast, robust inference) and highlights the importance of forecast quality and forecast-confidence inputs. Practically, the approach offers a viable path to increase BRP profitability and enable real-time, forecast-informed balancing actions under EBGL-aligned Belgian market dynamics.

Abstract

In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.

Paper Structure

This paper contains 17 sections, 11 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The proposed MPC-guided RL architecture. Real-time data (the RL state) is initially encoded by the RL-inspired network. The resulting embedding, the MPC action for that quarter hour and other inputs are then fed into the final decision-maker network to calculate the final action.
  • Figure 2: The deterministic MPC vs. RL results for (a) all quarter hours and (b) quarter hours without mFRR activation over the optimization horizon. In (b), RL results vary across different look-ahead horizons because of differently selected quarters for each horizon.
  • Figure 3: The stochastic MPC vs. RL results for (a) the 1 MW battery and (b) the 50 MW battery.
  • Figure 4: The comparison of various trained RL agents for (a) 1 MW and (b) 50 MW batteries.
  • Figure 5: The average deviation of the proposed MPC-guided RL agent actions from the MPC actions for the 1 MW battery under different forecast errors.