Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism
Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder
TL;DR
The paper addresses energy arbitrage for battery energy storage systems in a single-priced imbalance settlement by formulating the problem as a Markov decision process and solving it with distributional reinforcement learning. It compares DQN and SAC, along with their distributional variants (DDQN, DSAC), and incorporates a daily cycle constraint and a risk-sensitive objective using VaR, demonstrating that DSAC provides the best performance and that risk-averse policies can reduce tail risk at the cost of some profit. The framework is validated on Belgian imbalance prices from 2022, showing that cycle awareness yields fewer cycles and more conservative operation, while risk sensitivity narrows the profit distribution and improves tail risk metrics. The work highlights the practical potential of distributional, risk-aware RL for BRPs to harness imbalance-price arbitrage with BESS, guiding future extensions to day-ahead arbitrage and continuous action spaces.
Abstract
Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price.
