Table of Contents
Fetching ...

Enhancing Battery Storage Energy Arbitrage with Deep Reinforcement Learning and Time-Series Forecasting

Manuel Sage, Joshua Campbell, Yaoyao Fiona Zhao

TL;DR

This study combines DRL with time-series forecasting methods from deep learning to enhance the performance on energy arbitrage and hypothesize that multiple predictors convey useful information regarding the future development of electricity prices through a “majority vote” principle, enabling the DRL agent to learn more profitable control policies.

Abstract

Energy arbitrage is one of the most profitable sources of income for battery operators, generating revenues by buying and selling electricity at different prices. Forecasting these revenues is challenging due to the inherent uncertainty of electricity prices. Deep reinforcement learning (DRL) emerged in recent years as a promising tool, able to cope with uncertainty by training on large quantities of historical data. However, without access to future electricity prices, DRL agents can only react to the currently observed price and not learn to plan battery dispatch. Therefore, in this study, we combine DRL with time-series forecasting methods from deep learning to enhance the performance on energy arbitrage. We conduct a case study using price data from Alberta, Canada that is characterized by irregular price spikes and highly non-stationary. This data is challenging to forecast even when state-of-the-art deep learning models consisting of convolutional layers, recurrent layers, and attention modules are deployed. Our results show that energy arbitrage with DRL-enabled battery control still significantly benefits from these imperfect predictions, but only if predictors for several horizons are combined. Grouping multiple predictions for the next 24-hour window, accumulated rewards increased by 60% for deep Q-networks (DQN) compared to the experiments without forecasts. We hypothesize that multiple predictors, despite their imperfections, convey useful information regarding the future development of electricity prices through a "majority vote" principle, enabling the DRL agent to learn more profitable control policies.

Enhancing Battery Storage Energy Arbitrage with Deep Reinforcement Learning and Time-Series Forecasting

TL;DR

This study combines DRL with time-series forecasting methods from deep learning to enhance the performance on energy arbitrage and hypothesize that multiple predictors convey useful information regarding the future development of electricity prices through a “majority vote” principle, enabling the DRL agent to learn more profitable control policies.

Abstract

Energy arbitrage is one of the most profitable sources of income for battery operators, generating revenues by buying and selling electricity at different prices. Forecasting these revenues is challenging due to the inherent uncertainty of electricity prices. Deep reinforcement learning (DRL) emerged in recent years as a promising tool, able to cope with uncertainty by training on large quantities of historical data. However, without access to future electricity prices, DRL agents can only react to the currently observed price and not learn to plan battery dispatch. Therefore, in this study, we combine DRL with time-series forecasting methods from deep learning to enhance the performance on energy arbitrage. We conduct a case study using price data from Alberta, Canada that is characterized by irregular price spikes and highly non-stationary. This data is challenging to forecast even when state-of-the-art deep learning models consisting of convolutional layers, recurrent layers, and attention modules are deployed. Our results show that energy arbitrage with DRL-enabled battery control still significantly benefits from these imperfect predictions, but only if predictors for several horizons are combined. Grouping multiple predictions for the next 24-hour window, accumulated rewards increased by 60% for deep Q-networks (DQN) compared to the experiments without forecasts. We hypothesize that multiple predictors, despite their imperfections, convey useful information regarding the future development of electricity prices through a "majority vote" principle, enabling the DRL agent to learn more profitable control policies.

Paper Structure

This paper contains 18 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Schematic showing the integration of time-series forecasters into the agent-environment framework of reinforcement learning.
  • Figure 2: (Left) A one month sample of electricity prices in Alberta from May 2022. (Right) Mean and standard deviation of electricity prices in Alberta for the five years of data collected.
  • Figure 3: RMSE of tested models on the validation set for different forecast horizons.
  • Figure 4: DQN (left) and PPO (right) performance with different forecasting horizons for perfect (ground truth) and predicted forecasts after 50 episodes of training. All results were averaged over five independent runs. The error bars show $\pm$ one standard deviation.
  • Figure 5: A nine day sample of (Top) true electricity prices and predictions for selected horizons with the best deep learning models found, (Middle) the battery SOC and rewards for a trained DQN agent without access to forecasts, (Bottom) for a trained DQN agent with access to all seven predictions.
  • ...and 1 more figures