Learning the Optimal Power Flow: Environment Design Matters

Thomas Wolgast; Astrid Nieße

Learning the Optimal Power Flow: Environment Design Matters

Thomas Wolgast, Astrid Nieße

TL;DR

This work collects and implements diverse environment design decisions from the literature regarding training data, observation space, episode definition, and reward function choice, and shows the significant impact of these environment design options on RL-OPF training performance.

Abstract

To solve the optimal power flow (OPF) problem, reinforcement learning (RL) emerges as a promising new approach. However, the RL-OPF literature is strongly divided regarding the exact formulation of the OPF problem as an RL environment. In this work, we collect and implement diverse environment design decisions from the literature regarding training data, observation space, episode definition, and reward function choice. In an experimental analysis, we show the significant impact of these environment design options on RL-OPF training performance. Further, we derive some first recommendations regarding the choice of these design decisions. The created environment framework is fully open-source and can serve as a benchmark for future research in the RL-OPF field.

Learning the Optimal Power Flow: Environment Design Matters

TL;DR

Abstract

Paper Structure (37 sections, 23 equations, 7 figures, 2 tables)

This paper contains 37 sections, 23 equations, 7 figures, 2 tables.

Introduction
Challenges of the OPF as RL Problem
Environment Design Decisions
Training Data
Observation Space
Episode Definition
Reward Function
Summary
Environment Framework
Analyzing RL Environment Design
Exemplary OPF Problems as RL Environment Instances
VoltageControl
EcoDispatch
Default Environment Settings
RL Algorithm and Hyperparameters
...and 22 more sections

Figures (7)

Figure 1: The procedure and API of the developed RL-OPF environment framework, following the Gymnasium API.
Figure 2: VoltageControl: Scatter plot of normalized objective values and sum of violations.
Figure 3: EcoDispatch: Scatter plot of normalized objective values and sum of violations.
Figure 4: Training Data - Comparison of design options regarding optimization MAPE (first row), share of invalid solutions (second row), and variance in both the VoltageControl environment and the EcoDispatch environment (arranged in columns).
Figure 5: Observation Space - Comparison of design options regarding optimization MAPE (first row), share of invalid solutions (second row), and variance in both the VoltageControl environment and the EcoDispatch environment (arranged in columns).
...and 2 more figures

Learning the Optimal Power Flow: Environment Design Matters

TL;DR

Abstract

Learning the Optimal Power Flow: Environment Design Matters

Authors

TL;DR

Abstract

Table of Contents

Figures (7)