Table of Contents
Fetching ...

On-line reinforcement learning for optimization of real-life energy trading strategy

Łukasz Lepak, Paweł Wawrzyński

TL;DR

The paper tackles the problem of optimizing day-ahead energy trading for a prosumer with storage under uncertainty. It introduces an on-line reinforcement learning framework that learns from recorded environmental data, modeling the task as an MDP with states $s_t$, actions $a_t$, and rewards $r_t$, and incorporates weather forecasts as informative inputs. A neural black-box bidding policy optimized with online RL (notably the A2C algorithm) yields the highest profits, especially when weather data are included, and is shown to outperform gradient-free baselines and a range of alternative strategies. The findings demonstrate a practical path to ready-to-deploy, automated trading strategies for medium-sized energy prosumers and highlight the value of weather-aware information and storage capacity in improving market performance.

Abstract

An increasing share of energy is produced from renewable sources by many small producers. The efficiency of those sources is volatile and, to some extent, random, exacerbating the problem of energy market balancing. In many countries, this balancing is done on the day-ahead (DA) energy markets. This paper considers automated trading on the DA energy market by a medium-sized prosumer. We model this activity as a Markov Decision Process and formalize a framework in which an applicable in real-life strategy can be optimized with off-line data. We design a trading strategy that is fed with the available environmental information that can impact future prices, including weather forecasts. We use state-of-the-art reinforcement learning (RL) algorithms to optimize this strategy. For comparison, we also synthesize simple parametric trading strategies and optimize them with an evolutionary algorithm. Results show that our RL-based strategy generates the highest market profits.

On-line reinforcement learning for optimization of real-life energy trading strategy

TL;DR

The paper tackles the problem of optimizing day-ahead energy trading for a prosumer with storage under uncertainty. It introduces an on-line reinforcement learning framework that learns from recorded environmental data, modeling the task as an MDP with states , actions , and rewards , and incorporates weather forecasts as informative inputs. A neural black-box bidding policy optimized with online RL (notably the A2C algorithm) yields the highest profits, especially when weather data are included, and is shown to outperform gradient-free baselines and a range of alternative strategies. The findings demonstrate a practical path to ready-to-deploy, automated trading strategies for medium-sized energy prosumers and highlight the value of weather-aware information and storage capacity in improving market performance.

Abstract

An increasing share of energy is produced from renewable sources by many small producers. The efficiency of those sources is volatile and, to some extent, random, exacerbating the problem of energy market balancing. In many countries, this balancing is done on the day-ahead (DA) energy markets. This paper considers automated trading on the DA energy market by a medium-sized prosumer. We model this activity as a Markov Decision Process and formalize a framework in which an applicable in real-life strategy can be optimized with off-line data. We design a trading strategy that is fed with the available environmental information that can impact future prices, including weather forecasts. We use state-of-the-art reinforcement learning (RL) algorithms to optimize this strategy. For comparison, we also synthesize simple parametric trading strategies and optimize them with an evolutionary algorithm. Results show that our RL-based strategy generates the highest market profits.
Paper Structure (22 sections, 12 equations, 4 figures, 7 tables)

This paper contains 22 sections, 12 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Battery levels for the proposed strategy during five simulation days averaged over five testing runs.
  • Figure 2: Bids prices for the proposed strategy during five simulation days taken from the best testing run. The vertical axis of the upper plot is in a logarithmic scale.
  • Figure 3: Bids amounts for the proposed strategy during five simulation days taken from the best testing run.
  • Figure 4: Unscheduled transactions' amounts for the proposed strategy during five simulation days taken from the best testing run.