Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

Matthijs de Jong; Jan Viebahn; Yuliya Shapovalova

Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

Matthijs de Jong, Jan Viebahn, Yuliya Shapovalova

TL;DR

This work studies imitation learning for intra-day topology-control in power grids, training a fully-connected neural network (FCNN) on state–action pairs from two rule-based experts (Greedy and N-1) within Grid2Op's IEEE 14-bus setup. While IL accuracy is limited by class imbalance and overlap, IL-enabled agents—especially when augmented with minimal simulations as hybrids—achieve near-expert performance with orders-of-magnitude faster inference, across full-network and outage regimes. The results demonstrate the viability of fast, high-performing topology-control agents and highlight the potential benefits of hybrid IL approaches, while identifying distribution-shift and dataset bias as key challenges for future work. The study also emphasizes the importance of integrating simulation and robust action-selection strategies to realize practical, scalable grid-control solutions. These findings motivate further exploration of IL with advanced techniques (e.g., DAgger, graph-based models) and broader regime testing to generalize to real-world, larger grids.

Abstract

Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actions. In particular, we consider two rule-based expert agents: a greedy agent and a N-1 agent. While the latter is more computationally expensive since it takes N-1 safety considerations into account, it exhibits a much higher operational performance. We train a fully-connected neural network (FCNN) on expert state-action pairs and evaluate it in two ways. First, we find that classification accuracy is limited despite extensive hyperparameter tuning, due to class imbalance and class overlap. Second, as a power system agent, the FCNN performs only slightly worse than expert agents. Furthermore, hybrid agents, which incorporate minimal additional simulations, match expert agents' performance with significantly lower computational cost. Consequently, imitation learning shows promise for developing fast, high-performing power grid agents, motivating its further exploration in future L2RPN studies.

Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

TL;DR

Abstract

Paper Structure (24 sections, 4 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 4 equations, 6 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Power Grid Setup
Action space
Intra-day Scope & Regimes
Rule-based Expert Agents
Greedy Agent
N-1 Agent
Results of the Rule-based Agents
Full-Network Regime.
Planned-Outage Regime.
Unplanned-Outage Regime.
Imitation Learning Agents
Dataset
Imitation Learning
...and 9 more sections

Figures (6)

Figure 1: The default state of Grid2Op environment rte_case14_realistic. The percentages refer to the percentage of days completed of the greedy agent on the N-1 network with the annotated line disabled (read §\ref{['sssec:N-1_networks']}). Lines annotated with green denote the first cluster and lines annotated with blue the second. The Greedy agent could not operate well with the lines annotated with red disabled. Line 10 (black) does not clearly fall in any cluster, and line 18 (pink) cannot be disabled, as the resulting topology is inherently invalid.
Figure 2: Left: A log Pareto chart of the actions/classes in the validation set (blue) and corresponding accuracy per action/class (black). A negative trend between frequency rank and accuracy can be observed. Right: A log Pareto chart of the actions/classes in the validation set (blue) and how often they were predicted by the ML model (orange). The blue area that does not overlap with the orange area at higher ranks indicates that the model is biased against rare classes.
Figure 3: The cosine distance between the action distributions of greedy agents applied to the different (N-1) networks. The plot shows the presence of two clusters of (N-1) networks. The two yellow areas show the highly similar action distributions within these clusters. The blue areas show that the actions distributions are dissimilar between these two clusters. Because of the inability of the Greedy agent to operate them, certain N-1 networks are not included.
Figure 4: The training curves of the five final models. Green lines show the validation accuracy with the postprocessing step, yellow without. Lines are smoothed with a running average of 10.
Figure 5: The data points of the most confused classes projected on the first two principal components, for the N-1 networks with line 0 (left) and line 2 (right) disabled. The confused data points are overlaid in red.
...and 1 more figures

Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

TL;DR

Abstract

Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

Authors

TL;DR

Abstract

Table of Contents

Figures (6)