Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

Seyed Soroush Karimi Madahi; Bert Claessens; Chris Develder

Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder

TL;DR

The paper addresses energy arbitrage for battery energy storage systems in a single-priced imbalance settlement by formulating the problem as a Markov decision process and solving it with distributional reinforcement learning. It compares DQN and SAC, along with their distributional variants (DDQN, DSAC), and incorporates a daily cycle constraint and a risk-sensitive objective using VaR, demonstrating that DSAC provides the best performance and that risk-averse policies can reduce tail risk at the cost of some profit. The framework is validated on Belgian imbalance prices from 2022, showing that cycle awareness yields fewer cycles and more conservative operation, while risk sensitivity narrows the profit distribution and improves tail risk metrics. The work highlights the practical potential of distributional, risk-aware RL for BRPs to harness imbalance-price arbitrage with BESS, guiding future extensions to day-ahead arbitrage and continuous action spaces.

Abstract

Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price.

Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

TL;DR

Abstract

Paper Structure (18 sections, 12 equations, 13 figures, 3 tables)

This paper contains 18 sections, 12 equations, 13 figures, 3 tables.

Introduction
Background and Related Work
Problem Formulation
Imbalance Settlement Mechanism
MDP Formulation without Cycle Constraint Consideration
MDP Formulation with Cycle Constraint Consideration
Reinforcement Learning Methods
DQN
SAC
Distributional RL
Risk-sensitive RL
Simulation Results
Experimental Setup
Arbitrage Strategy without Cycle Constraint (Q1)
Arbitrage Strategy with Cycle Constraint (Q2)
...and 3 more sections

Figures (13)

Figure 1: The evolution of Belgian imbalance prices from 2018 to 2023
Figure 2: The overview of the proposed control framework
Figure 3: The learning process of the four RL methods for the risk-neutral without cycle constraint scenario. (a) The average daily profit of the RL methods. (b) The average daily number of cycles.
Figure 4: The projection of the learned policy in the risk-neutral without cycle constraint scenario for (a) DQN, (b) DDQN, (c) SAC, and (d) DSAC.
Figure 5: The cumulative distribution of the imbalance price in 2022.
...and 8 more figures

Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

TL;DR

Abstract

Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

Authors

TL;DR

Abstract

Table of Contents

Figures (13)