HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach

Malte Lehna; Clara Holzhüter; Sven Tomforde; Christoph Scholz

HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach

Malte Lehna, Clara Holzhüter, Sven Tomforde, Christoph Scholz

TL;DR

This work addresses grid control under high renewable penetration by moving from substation-level actions to holistic topology optimization via Target Topologies (TTs). It introduces a search method to identify robust TT configurations and upgrades the CurriculumAgent to a Topology Agent with a greedy TT-focused component, demonstrating significant improvements on the WCCI 2022 Grid2Op IEEE118 benchmark (mean score >10% higher; median survival ≈25% higher). The analysis reveals that most effective TT configurations remain close to the base topology, suggesting inherent robustness in near-base topologies. The results support adopting TT-based topology optimization as a practical, scalable enhancement for automated grid operation, with potential for integration into hierarchical control frameworks and future data reuse for training topology-specific agents.

Abstract

With the growth of Renewable Energy (RE) generation, the operation of power grids has become increasingly complex. One solution could be automated grid operation, where Deep Reinforcement Learning (DRL) has repeatedly shown significant potential in Learning to Run a Power Network (L2RPN) challenges. However, only individual actions at the substation level have been subjected to topology optimization by most existing DRL algorithms. In contrast, we propose a more holistic approach by proposing specific Target Topologies (TTs) as actions. These topologies are selected based on their robustness. As part of this paper, we present a search algorithm to find the TTs and upgrade our previously developed DRL agent CurriculumAgent (CAgent) to a novel topology agent. We compare the upgrade to the previous CAgent and can increase their L2RPN score significantly by 10%. Further, we achieve a 25% better median survival time with our TTs included. Later analysis shows that almost all TTs are close to the base topology, explaining their robustness

HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach

TL;DR

Abstract

Paper Structure (18 sections, 5 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 5 figures, 2 tables, 2 algorithms.

Introduction
Main Contributions
Related Work
Environment and agent structure
Grid2Op Environment
The CurriculumAgent
The Topology Approach
Target Topologies as an alternative to substation actions
Topology Agent
Experiments
Research Design
Experimental Results
Score
Survival Time
Topology Distribution
...and 3 more sections

Figures (5)

Figure 1: Simplified example of a topology action based on marot2020learning. With the injections of the generators and the high demand of the loads, the grid has an overload in the right line (red) in time step $t$ (Box a). To reach the stable state in $t+1$, splitting the bottom-right substation by assigning two different buses is essential (Box b). This can be achieved by executing the substation action (Box c) on substation No. 4. Alternatively, one can describe the desired outcome in the form of a (Box d). Note that the blue dots represent the $bus_{one}$, the red ones $bus_{two}$ and the dotted lines representation separate substations.
Figure 2: Visualization of the WCCI 2022 environment based on the IEEE118 grid. Grid2Op's internal plotting method was used to create the Figure.
Figure 3: Display of the agent's median survival time across all scenarios of the WCCI 2022 validation environment. The median is computed across the 20 random seeds. On top of the Figure, we display the
Figure 5: Display of the most frequently used on the validation data by the . We rank the topologies based on their occurrence and visualize the effected substations changes. The y-axis shows the switched substation in comparison to the base topology and the x-axis shows the ranked Top50 . The colors indicate the number of changed substations: Blue corresponds to 1, red to 2, green to 3 and purple to 4 changed substation in the .
Figure 6: Execution time for one seed and boxplot of overall computation time across all seeds.

HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach

TL;DR

Abstract

HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (5)