Table of Contents
Fetching ...

Learning a local trading strategy: deep reinforcement learning for grid-scale renewable energy integration

Caleb Ju, Constance Crozier

TL;DR

This paper explores the use of reinforcement learning for operating grid-scale batteries co-located with solar power, and shows RL achieves an average of 61% of the approximate theoretical optimal (non-causal) operation, outperforming advanced control methods on average.

Abstract

Variable renewable generation increases the challenge of balancing power supply and demand. Grid-scale batteries co-located with generation can help mitigate this misalignment. This paper explores the use of reinforcement learning (RL) for operating grid-scale batteries co-located with solar power. Our results show RL achieves an average of 61% (and up to 96%) of the approximate theoretical optimal (non-causal) operation, outperforming advanced control methods on average. Our findings suggest RL may be preferred when future signals are hard to predict. Moreover, RL has two significant advantages compared to simpler rules-based control: (1) that solar energy is more effectively shifted towards high demand periods, and (2) increased diversity of battery dispatch across different locations, reducing potential ramping issues caused by super-position of many similar actions.

Learning a local trading strategy: deep reinforcement learning for grid-scale renewable energy integration

TL;DR

This paper explores the use of reinforcement learning for operating grid-scale batteries co-located with solar power, and shows RL achieves an average of 61% of the approximate theoretical optimal (non-causal) operation, outperforming advanced control methods on average.

Abstract

Variable renewable generation increases the challenge of balancing power supply and demand. Grid-scale batteries co-located with generation can help mitigate this misalignment. This paper explores the use of reinforcement learning (RL) for operating grid-scale batteries co-located with solar power. Our results show RL achieves an average of 61% (and up to 96%) of the approximate theoretical optimal (non-causal) operation, outperforming advanced control methods on average. Our findings suggest RL may be preferred when future signals are hard to predict. Moreover, RL has two significant advantages compared to simpler rules-based control: (1) that solar energy is more effectively shifted towards high demand periods, and (2) increased diversity of battery dispatch across different locations, reducing potential ramping issues caused by super-position of many similar actions.

Paper Structure

This paper contains 13 sections, 9 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Information flow in an RL environment.
  • Figure 2: Profiles for two generator pnodes based in Los Angeles, CA. Both pndoes share the same solar profile. Zoom-ins of three consecutive days are shown.
  • Figure 3: Solar and LMP profile for Santa Cruz pnode. Similar plots to Fig. \ref{['fig:la_profile']}.
  • Figure 4: Mean (line) and confidence interval (shaded) of cumulative profit for RL, rules-based (rules), sell-only (sell), and approximate optimal benchmarks ($\sim$OPT). Subtitle displays average solar power (MW).
  • Figure 5: The average hourly daily LMP (red, left axis) and average hourly profit (blue, right axis). In both cases the average 24 hours is shown, averaged across all testing days and locations.
  • ...and 5 more figures